This notebook is a template with each step that you need to complete for the project.
Please fill in your code where there are explicit ? markers in the notebook. You are welcome to add more cells and code as you see fit.
Once you have completed all the code implementations, please export your notebook as a HTML file so the reviews can view your code. Make sure you have all outputs correctly outputted.
File-> Export Notebook As... -> Export Notebook as HTML
There is a writeup to complete as well after all code implememtation is done. Please answer all questions and attach the necessary tables and charts. You can complete the writeup in either markdown or PDF.
Completing the code template and writeup template will cover all of the rubric points for this project.
The rubric contains "Stand Out Suggestions" for enhancing the project beyond the minimum requirements. The stand out suggestions are optional. If you decide to pursue the "stand out suggestions", you can include the code in this notebook and also discuss the results in the writeup file.
Below is example of steps to get the API username and key. Each student will have their own username and key.
kaggle.json and use the username and key.
ml.t3.medium instance (2 vCPU + 4 GiB)Python 3 (MXNet 1.8 Python 3.7 CPU Optimized)!pip install -U pip
!pip install -U setuptools wheel
!pip install -U "mxnet<2.0.0" bokeh==2.0.1
!pip install autogluon --no-cache-dir
!pip install kaggle
# Without --no-cache-dir, smaller aws instances may have trouble installing
Requirement already satisfied: pip in /usr/local/lib/python3.7/site-packages (21.3.1)
Collecting pip
Using cached pip-22.3.1-py3-none-any.whl (2.1 MB)
Installing collected packages: pip
Attempting uninstall: pip
Found existing installation: pip 21.3.1
Uninstalling pip-21.3.1:
Successfully uninstalled pip-21.3.1
Successfully installed pip-22.3.1
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
Requirement already satisfied: setuptools in /usr/local/lib/python3.7/site-packages (59.4.0)
Collecting setuptools
Using cached setuptools-65.6.3-py3-none-any.whl (1.2 MB)
Collecting wheel
Using cached wheel-0.38.4-py3-none-any.whl (36 kB)
Installing collected packages: wheel, setuptools
Attempting uninstall: setuptools
Found existing installation: setuptools 59.4.0
Uninstalling setuptools-59.4.0:
Successfully uninstalled setuptools-59.4.0
Successfully installed setuptools-65.6.3 wheel-0.38.4
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
Collecting mxnet<2.0.0
Using cached mxnet-1.9.1-py3-none-manylinux2014_x86_64.whl (49.1 MB)
Collecting bokeh==2.0.1
Using cached bokeh-2.0.1-py3-none-any.whl
Requirement already satisfied: packaging>=16.8 in /usr/local/lib/python3.7/site-packages (from bokeh==2.0.1) (21.3)
Requirement already satisfied: typing-extensions>=3.7.4 in /usr/local/lib/python3.7/site-packages (from bokeh==2.0.1) (4.0.1)
Requirement already satisfied: numpy>=1.11.3 in /usr/local/lib/python3.7/site-packages (from bokeh==2.0.1) (1.19.1)
Requirement already satisfied: Jinja2>=2.7 in /usr/local/lib/python3.7/site-packages (from bokeh==2.0.1) (3.0.3)
Requirement already satisfied: PyYAML>=3.10 in /usr/local/lib/python3.7/site-packages (from bokeh==2.0.1) (5.4.1)
Requirement already satisfied: python-dateutil>=2.1 in /usr/local/lib/python3.7/site-packages (from bokeh==2.0.1) (2.8.2)
Requirement already satisfied: pillow>=4.0 in /usr/local/lib/python3.7/site-packages (from bokeh==2.0.1) (8.4.0)
Requirement already satisfied: tornado>=5 in /usr/local/lib/python3.7/site-packages (from bokeh==2.0.1) (6.1)
Requirement already satisfied: requests<3,>=2.20.0 in /usr/local/lib/python3.7/site-packages (from mxnet<2.0.0) (2.22.0)
Requirement already satisfied: graphviz<0.9.0,>=0.8.1 in /usr/local/lib/python3.7/site-packages (from mxnet<2.0.0) (0.8.4)
Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.7/site-packages (from Jinja2>=2.7->bokeh==2.0.1) (2.0.1)
Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /usr/local/lib/python3.7/site-packages (from packaging>=16.8->bokeh==2.0.1) (3.0.6)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.7/site-packages (from python-dateutil>=2.1->bokeh==2.0.1) (1.16.0)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.7/site-packages (from requests<3,>=2.20.0->mxnet<2.0.0) (2021.10.8)
Requirement already satisfied: idna<2.9,>=2.5 in /usr/local/lib/python3.7/site-packages (from requests<3,>=2.20.0->mxnet<2.0.0) (2.8)
Requirement already satisfied: chardet<3.1.0,>=3.0.2 in /usr/local/lib/python3.7/site-packages (from requests<3,>=2.20.0->mxnet<2.0.0) (3.0.4)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.7/site-packages (from requests<3,>=2.20.0->mxnet<2.0.0) (1.25.11)
Installing collected packages: mxnet, bokeh
Attempting uninstall: bokeh
Found existing installation: bokeh 2.4.2
Uninstalling bokeh-2.4.2:
Successfully uninstalled bokeh-2.4.2
Successfully installed bokeh-2.0.1 mxnet-1.9.1
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
Collecting autogluon
Downloading autogluon-0.6.1-py3-none-any.whl (9.8 kB)
Collecting autogluon.tabular[all]==0.6.1
Downloading autogluon.tabular-0.6.1-py3-none-any.whl (286 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 286.0/286.0 kB 124.9 MB/s eta 0:00:00
Collecting autogluon.timeseries[all]==0.6.1
Downloading autogluon.timeseries-0.6.1-py3-none-any.whl (103 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 103.0/103.0 kB 198.1 MB/s eta 0:00:00
Collecting autogluon.features==0.6.1
Downloading autogluon.features-0.6.1-py3-none-any.whl (59 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 60.0/60.0 kB 172.7 MB/s eta 0:00:00
Collecting autogluon.core[all]==0.6.1
Downloading autogluon.core-0.6.1-py3-none-any.whl (226 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 226.6/226.6 kB 222.9 MB/s eta 0:00:00
Collecting autogluon.multimodal==0.6.1
Downloading autogluon.multimodal-0.6.1-py3-none-any.whl (289 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 289.7/289.7 kB 226.8 MB/s eta 0:00:00
Collecting autogluon.vision==0.6.1
Downloading autogluon.vision-0.6.1-py3-none-any.whl (49 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 49.8/49.8 kB 163.5 MB/s eta 0:00:00
Collecting autogluon.text==0.6.1
Downloading autogluon.text-0.6.1-py3-none-any.whl (62 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 62.1/62.1 kB 124.8 MB/s eta 0:00:00
Requirement already satisfied: requests in /usr/local/lib/python3.7/site-packages (from autogluon.core[all]==0.6.1->autogluon) (2.22.0)
Requirement already satisfied: matplotlib in /usr/local/lib/python3.7/site-packages (from autogluon.core[all]==0.6.1->autogluon) (3.5.0)
Requirement already satisfied: scikit-learn<1.2,>=1.0.0 in /usr/local/lib/python3.7/site-packages (from autogluon.core[all]==0.6.1->autogluon) (1.0.1)
Collecting dask<=2021.11.2,>=2021.09.1
Downloading dask-2021.11.2-py3-none-any.whl (1.0 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.0/1.0 MB 239.0 MB/s eta 0:00:00
Requirement already satisfied: pandas!=1.4.0,<1.6,>=1.2.5 in /usr/local/lib/python3.7/site-packages (from autogluon.core[all]==0.6.1->autogluon) (1.3.4)
Collecting scipy<1.10.0,>=1.5.4
Downloading scipy-1.7.3-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (38.1 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 38.1/38.1 MB 169.9 MB/s eta 0:00:00a 0:00:01
Requirement already satisfied: tqdm>=4.38.0 in /usr/local/lib/python3.7/site-packages (from autogluon.core[all]==0.6.1->autogluon) (4.39.0)
Collecting distributed<=2021.11.2,>=2021.09.1
Downloading distributed-2021.11.2-py3-none-any.whl (802 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 802.2/802.2 kB 253.0 MB/s eta 0:00:00
Requirement already satisfied: boto3 in /usr/local/lib/python3.7/site-packages (from autogluon.core[all]==0.6.1->autogluon) (1.20.17)
Collecting autogluon.common==0.6.1
Downloading autogluon.common-0.6.1-py3-none-any.whl (41 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 41.5/41.5 kB 138.0 MB/s eta 0:00:00
Collecting numpy<1.24,>=1.21
Downloading numpy-1.21.6-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (15.7 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 15.7/15.7 MB 170.2 MB/s eta 0:00:00a 0:00:01
Collecting ray[tune]<2.1,>=2.0
Downloading ray-2.0.1-cp37-cp37m-manylinux2014_x86_64.whl (60.5 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 60.5/60.5 MB 165.5 MB/s eta 0:00:0000:0100:01
Collecting hyperopt<0.2.8,>=0.2.7
Downloading hyperopt-0.2.7-py2.py3-none-any.whl (1.6 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.6/1.6 MB 235.8 MB/s eta 0:00:00
Requirement already satisfied: psutil<6,>=5.7.3 in /usr/local/lib/python3.7/site-packages (from autogluon.features==0.6.1->autogluon) (5.8.0)
Collecting evaluate<=0.3.0
Downloading evaluate-0.3.0-py3-none-any.whl (72 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 72.9/72.9 kB 173.0 MB/s eta 0:00:00
Collecting transformers<4.24.0,>=4.23.0
Downloading transformers-4.23.1-py3-none-any.whl (5.3 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 5.3/5.3 MB 183.8 MB/s eta 0:00:00
Collecting nptyping<1.5.0,>=1.4.4
Downloading nptyping-1.4.4-py3-none-any.whl (31 kB)
Collecting jsonschema<=4.8.0
Downloading jsonschema-4.8.0-py3-none-any.whl (81 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 81.4/81.4 kB 203.0 MB/s eta 0:00:00
Collecting torchmetrics<0.9.0,>=0.8.0
Downloading torchmetrics-0.8.2-py3-none-any.whl (409 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 409.8/409.8 kB 169.6 MB/s eta 0:00:00
Collecting pytorch-metric-learning<1.4.0,>=1.3.0
Downloading pytorch_metric_learning-1.3.2-py3-none-any.whl (109 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 109.4/109.4 kB 192.2 MB/s eta 0:00:00
Collecting torch<1.13,>=1.9
Downloading torch-1.12.1-cp37-cp37m-manylinux1_x86_64.whl (776.3 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 776.3/776.3 MB 171.6 MB/s eta 0:00:0000:0100:01
Collecting timm<0.7.0
Downloading timm-0.6.12-py3-none-any.whl (549 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 549.1/549.1 kB 238.6 MB/s eta 0:00:00
Collecting nlpaug<=1.1.10,>=1.1.10
Downloading nlpaug-1.1.10-py3-none-any.whl (410 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 410.8/410.8 kB 235.6 MB/s eta 0:00:00
Collecting pytorch-lightning<1.8.0,>=1.7.4
Downloading pytorch_lightning-1.7.7-py3-none-any.whl (708 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 708.1/708.1 kB 242.0 MB/s eta 0:00:00
Collecting sentencepiece<0.2.0,>=0.1.95
Downloading sentencepiece-0.1.97-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.3/1.3 MB 217.6 MB/s eta 0:00:00
Collecting fairscale<=0.4.6,>=0.4.5
Downloading fairscale-0.4.6.tar.gz (248 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 248.2/248.2 kB 221.5 MB/s eta 0:00:00
Installing build dependencies ... done
Getting requirements to build wheel ... done
Installing backend dependencies ... done
Preparing metadata (pyproject.toml) ... done
Collecting nltk<4.0.0,>=3.4.5
Downloading nltk-3.8-py3-none-any.whl (1.5 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.5/1.5 MB 227.1 MB/s eta 0:00:00
Collecting defusedxml<=0.7.1,>=0.7.1
Downloading defusedxml-0.7.1-py2.py3-none-any.whl (25 kB)
Collecting omegaconf<2.2.0,>=2.1.1
Downloading omegaconf-2.1.2-py3-none-any.whl (74 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 74.7/74.7 kB 191.1 MB/s eta 0:00:00
Collecting torchtext<0.14.0
Downloading torchtext-0.13.1-cp37-cp37m-manylinux1_x86_64.whl (1.9 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.9/1.9 MB 248.0 MB/s eta 0:00:00
Collecting torchvision<0.14.0
Downloading torchvision-0.13.1-cp37-cp37m-manylinux1_x86_64.whl (19.1 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 19.1/19.1 MB 190.5 MB/s eta 0:00:00a 0:00:01
Collecting Pillow<=9.4.0,>=9.3.0
Downloading Pillow-9.3.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.2 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.2/3.2 MB 250.2 MB/s eta 0:00:00
Collecting seqeval<=1.2.2
Downloading seqeval-1.2.2.tar.gz (43 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 43.6/43.6 kB 147.1 MB/s eta 0:00:00
Preparing metadata (setup.py) ... done
Collecting scikit-image<0.20.0,>=0.19.1
Downloading scikit_image-0.19.3-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (13.5 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 13.5/13.5 MB 162.8 MB/s eta 0:00:00a 0:00:01
Collecting albumentations<=1.2.0,>=1.1.0
Downloading albumentations-1.2.0-py3-none-any.whl (113 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 113.5/113.5 kB 206.3 MB/s eta 0:00:00
Collecting text-unidecode<=1.3
Downloading text_unidecode-1.3-py2.py3-none-any.whl (78 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 78.2/78.2 kB 192.0 MB/s eta 0:00:00
Collecting openmim<=0.2.1,>0.1.5
Downloading openmim-0.2.1-py2.py3-none-any.whl (49 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 49.7/49.7 kB 172.2 MB/s eta 0:00:00
Collecting accelerate<0.14,>=0.9
Downloading accelerate-0.13.2-py3-none-any.whl (148 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 148.8/148.8 kB 221.9 MB/s eta 0:00:00
Collecting smart-open<5.3.0,>=5.2.1
Downloading smart_open-5.2.1-py3-none-any.whl (58 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 58.6/58.6 kB 173.2 MB/s eta 0:00:00
Requirement already satisfied: networkx<3.0,>=2.3 in /usr/local/lib/python3.7/site-packages (from autogluon.tabular[all]==0.6.1->autogluon) (2.6.3)
Collecting fastai<2.8,>=2.3.1
Downloading fastai-2.7.10-py3-none-any.whl (240 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 240.9/240.9 kB 242.0 MB/s eta 0:00:00
Collecting lightgbm<3.4,>=3.3
Downloading lightgbm-3.3.3-py3-none-manylinux1_x86_64.whl (2.0 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.0/2.0 MB 257.8 MB/s eta 0:00:00
Collecting xgboost<1.8,>=1.6
Downloading xgboost-1.6.2-py3-none-manylinux2014_x86_64.whl (255.9 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 255.9/255.9 MB 179.4 MB/s eta 0:00:0000:0100:01
Collecting catboost<1.2,>=1.0
Downloading catboost-1.1.1-cp37-none-manylinux1_x86_64.whl (76.6 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 76.6/76.6 MB 185.4 MB/s eta 0:00:00a 0:00:01
Collecting statsmodels~=0.13.0
Downloading statsmodels-0.13.5-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (9.9 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 9.9/9.9 MB 195.5 MB/s eta 0:00:00a 0:00:01
Collecting gluonts~=0.11.0
Downloading gluonts-0.11.6-py3-none-any.whl (1.0 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.0/1.0 MB 251.1 MB/s eta 0:00:00
Requirement already satisfied: joblib~=1.1 in /usr/local/lib/python3.7/site-packages (from autogluon.timeseries[all]==0.6.1->autogluon) (1.1.0)
Collecting sktime<0.14,>=0.13.1
Downloading sktime-0.13.4-py3-none-any.whl (7.0 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 7.0/7.0 MB 180.1 MB/s eta 0:00:00a 0:00:01
Collecting pmdarima~=1.8.2
Downloading pmdarima-1.8.5-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl (1.4 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.4/1.4 MB 250.6 MB/s eta 0:00:00
Collecting tbats~=1.1
Downloading tbats-1.1.2-py3-none-any.whl (43 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 43.8/43.8 kB 143.5 MB/s eta 0:00:00
Collecting gluoncv<0.10.6,>=0.10.5
Downloading gluoncv-0.10.5.post0-py2.py3-none-any.whl (1.3 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.3/1.3 MB 258.1 MB/s eta 0:00:00
Requirement already satisfied: setuptools in /usr/local/lib/python3.7/site-packages (from autogluon.common==0.6.1->autogluon.core[all]==0.6.1->autogluon) (65.6.3)
Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.7/site-packages (from accelerate<0.14,>=0.9->autogluon.multimodal==0.6.1->autogluon) (21.3)
Requirement already satisfied: pyyaml in /usr/local/lib/python3.7/site-packages (from accelerate<0.14,>=0.9->autogluon.multimodal==0.6.1->autogluon) (5.4.1)
Collecting albumentations<=1.2.0,>=1.1.0
Downloading albumentations-1.1.0-py3-none-any.whl (102 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 102.4/102.4 kB 202.0 MB/s eta 0:00:00
Collecting qudida>=0.0.4
Downloading qudida-0.0.4-py3-none-any.whl (3.5 kB)
Collecting opencv-python-headless>=4.1.1
Downloading opencv_python_headless-4.6.0.66-cp36-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (48.3 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 48.3/48.3 MB 177.0 MB/s eta 0:00:00a 0:00:01
Requirement already satisfied: graphviz in /usr/local/lib/python3.7/site-packages (from catboost<1.2,>=1.0->autogluon.tabular[all]==0.6.1->autogluon) (0.8.4)
Requirement already satisfied: plotly in /usr/local/lib/python3.7/site-packages (from catboost<1.2,>=1.0->autogluon.tabular[all]==0.6.1->autogluon) (5.4.0)
Requirement already satisfied: six in /usr/local/lib/python3.7/site-packages (from catboost<1.2,>=1.0->autogluon.tabular[all]==0.6.1->autogluon) (1.16.0)
Collecting toolz>=0.8.2
Downloading toolz-0.12.0-py3-none-any.whl (55 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 55.8/55.8 kB 154.0 MB/s eta 0:00:00
Requirement already satisfied: fsspec>=0.6.0 in /usr/local/lib/python3.7/site-packages (from dask<=2021.11.2,>=2021.09.1->autogluon.core[all]==0.6.1->autogluon) (2021.11.1)
Requirement already satisfied: cloudpickle>=1.1.1 in /usr/local/lib/python3.7/site-packages (from dask<=2021.11.2,>=2021.09.1->autogluon.core[all]==0.6.1->autogluon) (2.0.0)
Collecting partd>=0.3.10
Downloading partd-1.3.0-py3-none-any.whl (18 kB)
Collecting click>=6.6
Downloading click-8.1.3-py3-none-any.whl (96 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 96.6/96.6 kB 184.4 MB/s eta 0:00:00
Collecting msgpack>=0.6.0
Downloading msgpack-1.0.4-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (299 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 299.8/299.8 kB 227.4 MB/s eta 0:00:00
Collecting zict>=0.1.3
Downloading zict-2.2.0-py2.py3-none-any.whl (23 kB)
Requirement already satisfied: jinja2 in /usr/local/lib/python3.7/site-packages (from distributed<=2021.11.2,>=2021.09.1->autogluon.core[all]==0.6.1->autogluon) (3.0.3)
Collecting sortedcontainers!=2.0.0,!=2.0.1
Downloading sortedcontainers-2.4.0-py2.py3-none-any.whl (29 kB)
Collecting tblib>=1.6.0
Downloading tblib-1.7.0-py2.py3-none-any.whl (12 kB)
Requirement already satisfied: tornado>=5 in /usr/local/lib/python3.7/site-packages (from distributed<=2021.11.2,>=2021.09.1->autogluon.core[all]==0.6.1->autogluon) (6.1)
Collecting responses<0.19
Downloading responses-0.18.0-py3-none-any.whl (38 kB)
Collecting xxhash
Downloading xxhash-3.1.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (212 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 213.0/213.0 kB 226.0 MB/s eta 0:00:00
Requirement already satisfied: dill in /usr/local/lib/python3.7/site-packages (from evaluate<=0.3.0->autogluon.multimodal==0.6.1->autogluon) (0.3.4)
Collecting huggingface-hub>=0.7.0
Downloading huggingface_hub-0.11.1-py3-none-any.whl (182 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 182.4/182.4 kB 211.1 MB/s eta 0:00:00
Collecting tqdm>=4.38.0
Downloading tqdm-4.64.1-py2.py3-none-any.whl (78 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 78.5/78.5 kB 188.4 MB/s eta 0:00:00
Requirement already satisfied: importlib-metadata in /usr/local/lib/python3.7/site-packages (from evaluate<=0.3.0->autogluon.multimodal==0.6.1->autogluon) (4.8.2)
Requirement already satisfied: multiprocess in /usr/local/lib/python3.7/site-packages (from evaluate<=0.3.0->autogluon.multimodal==0.6.1->autogluon) (0.70.12.2)
Collecting datasets>=2.0.0
Downloading datasets-2.8.0-py3-none-any.whl (452 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 452.9/452.9 kB 233.9 MB/s eta 0:00:00
Requirement already satisfied: pip in /usr/local/lib/python3.7/site-packages (from fastai<2.8,>=2.3.1->autogluon.tabular[all]==0.6.1->autogluon) (22.3.1)
Collecting spacy<4
Downloading spacy-3.4.4-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (6.4 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.4/6.4 MB 225.8 MB/s eta 0:00:00
Collecting fastprogress>=0.2.4
Downloading fastprogress-1.0.3-py3-none-any.whl (12 kB)
Collecting fastcore<1.6,>=1.4.5
Downloading fastcore-1.5.27-py3-none-any.whl (67 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 67.1/67.1 kB 106.1 MB/s eta 0:00:00
Collecting fastdownload<2,>=0.0.5
Downloading fastdownload-0.0.7-py3-none-any.whl (12 kB)
Collecting yacs
Downloading yacs-0.1.8-py3-none-any.whl (14 kB)
Requirement already satisfied: portalocker in /usr/local/lib/python3.7/site-packages (from gluoncv<0.10.6,>=0.10.5->autogluon.vision==0.6.1->autogluon) (2.3.2)
Requirement already satisfied: opencv-python in /usr/local/lib/python3.7/site-packages (from gluoncv<0.10.6,>=0.10.5->autogluon.vision==0.6.1->autogluon) (4.5.4.60)
Collecting autocfg
Downloading autocfg-0.0.8-py3-none-any.whl (13 kB)
Requirement already satisfied: typing-extensions~=4.0 in /usr/local/lib/python3.7/site-packages (from gluonts~=0.11.0->autogluon.timeseries[all]==0.6.1->autogluon) (4.0.1)
Collecting pydantic~=1.7
Downloading pydantic-1.10.2-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (11.8 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 11.8/11.8 MB 191.8 MB/s eta 0:00:00a 0:00:01
Collecting py4j
Downloading py4j-0.10.9.7-py2.py3-none-any.whl (200 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 200.5/200.5 kB 217.9 MB/s eta 0:00:00
Collecting future
Downloading future-0.18.2.tar.gz (829 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 829.2/829.2 kB 255.4 MB/s eta 0:00:00
Preparing metadata (setup.py) ... done
Collecting importlib-resources>=1.4.0
Downloading importlib_resources-5.10.1-py3-none-any.whl (34 kB)
Requirement already satisfied: attrs>=17.4.0 in /usr/local/lib/python3.7/site-packages (from jsonschema<=4.8.0->autogluon.multimodal==0.6.1->autogluon) (21.2.0)
Collecting pyrsistent!=0.17.0,!=0.17.1,!=0.17.2,>=0.14.0
Downloading pyrsistent-0.19.2-py3-none-any.whl (57 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 57.5/57.5 kB 167.2 MB/s eta 0:00:00
Requirement already satisfied: wheel in /usr/local/lib/python3.7/site-packages (from lightgbm<3.4,>=3.3->autogluon.tabular[all]==0.6.1->autogluon) (0.38.4)
Collecting regex>=2021.8.3
Downloading regex-2022.10.31-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (757 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 757.1/757.1 kB 250.5 MB/s eta 0:00:00
Collecting typish>=1.7.0
Downloading typish-1.9.3-py3-none-any.whl (45 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 45.1/45.1 kB 142.6 MB/s eta 0:00:00
Collecting antlr4-python3-runtime==4.8
Downloading antlr4-python3-runtime-4.8.tar.gz (112 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 112.4/112.4 kB 191.7 MB/s eta 0:00:00
Preparing metadata (setup.py) ... done
Collecting rich
Downloading rich-12.6.0-py3-none-any.whl (237 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 237.5/237.5 kB 222.5 MB/s eta 0:00:00
Collecting model-index
Downloading model_index-0.1.11-py3-none-any.whl (34 kB)
Requirement already satisfied: colorama in /usr/local/lib/python3.7/site-packages (from openmim<=0.2.1,>0.1.5->autogluon.multimodal==0.6.1->autogluon) (0.4.3)
Requirement already satisfied: tabulate in /usr/local/lib/python3.7/site-packages (from openmim<=0.2.1,>0.1.5->autogluon.multimodal==0.6.1->autogluon) (0.8.9)
Requirement already satisfied: pytz>=2017.3 in /usr/local/lib/python3.7/site-packages (from pandas!=1.4.0,<1.6,>=1.2.5->autogluon.core[all]==0.6.1->autogluon) (2021.3)
Requirement already satisfied: python-dateutil>=2.7.3 in /usr/local/lib/python3.7/site-packages (from pandas!=1.4.0,<1.6,>=1.2.5->autogluon.core[all]==0.6.1->autogluon) (2.8.2)
Requirement already satisfied: Cython!=0.29.18,>=0.29 in /usr/local/lib/python3.7/site-packages (from pmdarima~=1.8.2->autogluon.timeseries[all]==0.6.1->autogluon) (0.29.24)
Requirement already satisfied: urllib3 in /usr/local/lib/python3.7/site-packages (from pmdarima~=1.8.2->autogluon.timeseries[all]==0.6.1->autogluon) (1.25.11)
Collecting pyDeprecate>=0.3.1
Downloading pyDeprecate-0.3.2-py3-none-any.whl (10 kB)
Collecting tensorboard>=2.9.1
Downloading tensorboard-2.11.0-py3-none-any.whl (6.0 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.0/6.0 MB 216.5 MB/s eta 0:00:00
Collecting virtualenv
Downloading virtualenv-20.17.1-py3-none-any.whl (8.8 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 8.8/8.8 MB 182.2 MB/s eta 0:00:00a 0:00:01
Collecting grpcio<=1.43.0,>=1.32.0
Downloading grpcio-1.43.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.1 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 4.1/4.1 MB 167.7 MB/s eta 0:00:00
Collecting frozenlist
Downloading frozenlist-1.3.3-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (148 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 148.0/148.0 kB 214.2 MB/s eta 0:00:00
Collecting aiosignal
Downloading aiosignal-1.3.1-py3-none-any.whl (7.6 kB)
Collecting click>=6.6
Downloading click-8.0.4-py3-none-any.whl (97 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 97.5/97.5 kB 87.3 MB/s eta 0:00:00
Collecting filelock
Downloading filelock-3.8.2-py3-none-any.whl (10 kB)
Requirement already satisfied: protobuf<4.0.0,>=3.15.3 in /usr/local/lib/python3.7/site-packages (from ray[tune]<2.1,>=2.0->autogluon.core[all]==0.6.1->autogluon) (3.19.1)
Collecting tensorboardX>=1.9
Downloading tensorboardX-2.5.1-py2.py3-none-any.whl (125 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 125.4/125.4 kB 218.6 MB/s eta 0:00:00
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.7/site-packages (from requests->autogluon.core[all]==0.6.1->autogluon) (2021.10.8)
Requirement already satisfied: idna<2.9,>=2.5 in /usr/local/lib/python3.7/site-packages (from requests->autogluon.core[all]==0.6.1->autogluon) (2.8)
Requirement already satisfied: chardet<3.1.0,>=3.0.2 in /usr/local/lib/python3.7/site-packages (from requests->autogluon.core[all]==0.6.1->autogluon) (3.0.4)
Collecting tifffile>=2019.7.26
Downloading tifffile-2021.11.2-py3-none-any.whl (178 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 178.9/178.9 kB 225.2 MB/s eta 0:00:00
Requirement already satisfied: imageio>=2.4.1 in /usr/local/lib/python3.7/site-packages (from scikit-image<0.20.0,>=0.19.1->autogluon.multimodal==0.6.1->autogluon) (2.13.1)
Collecting PyWavelets>=1.1.1
Downloading PyWavelets-1.3.0-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (6.4 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.4/6.4 MB 207.4 MB/s eta 0:00:00
Requirement already satisfied: threadpoolctl>=2.0.0 in /usr/local/lib/python3.7/site-packages (from scikit-learn<1.2,>=1.0.0->autogluon.core[all]==0.6.1->autogluon) (3.0.0)
Collecting deprecated>=1.2.13
Downloading Deprecated-1.2.13-py2.py3-none-any.whl (9.6 kB)
Requirement already satisfied: numba>=0.53 in /usr/local/lib/python3.7/site-packages (from sktime<0.14,>=0.13.1->autogluon.timeseries[all]==0.6.1->autogluon) (0.53.1)
Collecting patsy>=0.5.2
Downloading patsy-0.5.3-py2.py3-none-any.whl (233 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 233.8/233.8 kB 240.6 MB/s eta 0:00:00
Collecting tokenizers!=0.11.3,<0.14,>=0.11.1
Downloading tokenizers-0.13.2-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.6 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 7.6/7.6 MB 177.3 MB/s eta 0:00:00a 0:00:01
Requirement already satisfied: s3transfer<0.6.0,>=0.5.0 in /usr/local/lib/python3.7/site-packages (from boto3->autogluon.core[all]==0.6.1->autogluon) (0.5.0)
Requirement already satisfied: botocore<1.24.0,>=1.23.17 in /usr/local/lib/python3.7/site-packages (from boto3->autogluon.core[all]==0.6.1->autogluon) (1.23.17)
Requirement already satisfied: jmespath<1.0.0,>=0.7.1 in /usr/local/lib/python3.7/site-packages (from boto3->autogluon.core[all]==0.6.1->autogluon) (0.10.0)
Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.7/site-packages (from matplotlib->autogluon.core[all]==0.6.1->autogluon) (1.3.2)
Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.7/site-packages (from matplotlib->autogluon.core[all]==0.6.1->autogluon) (0.11.0)
Requirement already satisfied: fonttools>=4.22.0 in /usr/local/lib/python3.7/site-packages (from matplotlib->autogluon.core[all]==0.6.1->autogluon) (4.28.2)
Requirement already satisfied: setuptools-scm>=4 in /usr/local/lib/python3.7/site-packages (from matplotlib->autogluon.core[all]==0.6.1->autogluon) (6.3.2)
Requirement already satisfied: pyparsing>=2.2.1 in /usr/local/lib/python3.7/site-packages (from matplotlib->autogluon.core[all]==0.6.1->autogluon) (3.0.6)
Requirement already satisfied: pyarrow>=6.0.0 in /usr/local/lib/python3.7/site-packages (from datasets>=2.0.0->evaluate<=0.3.0->autogluon.multimodal==0.6.1->autogluon) (6.0.1)
Collecting aiohttp
Downloading aiohttp-3.8.3-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (948 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 948.0/948.0 kB 258.6 MB/s eta 0:00:00
Collecting wrapt<2,>=1.10
Downloading wrapt-1.14.1-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (75 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 75.2/75.2 kB 190.7 MB/s eta 0:00:00
Requirement already satisfied: zipp>=3.1.0 in /usr/local/lib/python3.7/site-packages (from importlib-resources>=1.4.0->jsonschema<=4.8.0->autogluon.multimodal==0.6.1->autogluon) (3.6.0)
Requirement already satisfied: llvmlite<0.37,>=0.36.0rc1 in /usr/local/lib/python3.7/site-packages (from numba>=0.53->sktime<0.14,>=0.13.1->autogluon.timeseries[all]==0.6.1->autogluon) (0.36.0)
Collecting locket
Downloading locket-1.0.0-py2.py3-none-any.whl (4.4 kB)
Collecting typing-extensions~=4.0
Downloading typing_extensions-4.4.0-py3-none-any.whl (26 kB)
Requirement already satisfied: tomli>=1.0.0 in /usr/local/lib/python3.7/site-packages (from setuptools-scm>=4->matplotlib->autogluon.core[all]==0.6.1->autogluon) (1.2.2)
Collecting thinc<8.2.0,>=8.1.0
Downloading thinc-8.1.6-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (814 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 814.4/814.4 kB 257.8 MB/s eta 0:00:00
Collecting catalogue<2.1.0,>=2.0.6
Downloading catalogue-2.0.8-py3-none-any.whl (17 kB)
Collecting srsly<3.0.0,>=2.4.3
Downloading srsly-2.4.5-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (490 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 490.0/490.0 kB 245.3 MB/s eta 0:00:00
Collecting wasabi<1.1.0,>=0.9.1
Downloading wasabi-0.10.1-py3-none-any.whl (26 kB)
Collecting spacy-loggers<2.0.0,>=1.0.0
Downloading spacy_loggers-1.0.4-py3-none-any.whl (11 kB)
Collecting murmurhash<1.1.0,>=0.28.0
Downloading murmurhash-1.0.9-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (21 kB)
Collecting spacy-legacy<3.1.0,>=3.0.10
Downloading spacy_legacy-3.0.10-py2.py3-none-any.whl (21 kB)
Collecting typing-extensions~=4.0
Downloading typing_extensions-4.1.1-py3-none-any.whl (26 kB)
Collecting cymem<2.1.0,>=2.0.2
Downloading cymem-2.0.7-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (36 kB)
Collecting langcodes<4.0.0,>=3.2.0
Downloading langcodes-3.3.0-py3-none-any.whl (181 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 181.6/181.6 kB 228.7 MB/s eta 0:00:00
Collecting typer<0.8.0,>=0.3.0
Downloading typer-0.7.0-py3-none-any.whl (38 kB)
Collecting pathy>=0.3.5
Downloading pathy-0.10.1-py3-none-any.whl (48 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 48.9/48.9 kB 155.2 MB/s eta 0:00:00
Collecting preshed<3.1.0,>=3.0.2
Downloading preshed-3.0.8-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (126 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 126.6/126.6 kB 191.5 MB/s eta 0:00:00
Collecting absl-py>=0.4
Downloading absl_py-1.3.0-py3-none-any.whl (124 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 124.6/124.6 kB 201.4 MB/s eta 0:00:00
Collecting google-auth-oauthlib<0.5,>=0.4.1
Downloading google_auth_oauthlib-0.4.6-py2.py3-none-any.whl (18 kB)
Collecting tensorboard-data-server<0.7.0,>=0.6.0
Downloading tensorboard_data_server-0.6.1-py3-none-manylinux2010_x86_64.whl (4.9 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 4.9/4.9 MB 236.9 MB/s eta 0:00:00
Collecting markdown>=2.6.8
Downloading Markdown-3.4.1-py3-none-any.whl (93 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 93.3/93.3 kB 197.1 MB/s eta 0:00:00
Collecting google-auth<3,>=1.6.3
Downloading google_auth-2.15.0-py2.py3-none-any.whl (177 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 177.0/177.0 kB 215.2 MB/s eta 0:00:00
Collecting tensorboard-plugin-wit>=1.6.0
Downloading tensorboard_plugin_wit-1.8.1-py3-none-any.whl (781 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 781.3/781.3 kB 260.7 MB/s eta 0:00:00
Requirement already satisfied: werkzeug>=1.0.1 in /usr/local/lib/python3.7/site-packages (from tensorboard>=2.9.1->pytorch-lightning<1.8.0,>=1.7.4->autogluon.multimodal==0.6.1->autogluon) (2.0.2)
Collecting heapdict
Downloading HeapDict-1.0.1-py3-none-any.whl (3.9 kB)
Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.7/site-packages (from jinja2->distributed<=2021.11.2,>=2021.09.1->autogluon.core[all]==0.6.1->autogluon) (2.0.1)
Collecting ordered-set
Downloading ordered_set-4.1.0-py3-none-any.whl (7.6 kB)
Requirement already satisfied: tenacity>=6.2.0 in /usr/local/lib/python3.7/site-packages (from plotly->catboost<1.2,>=1.0->autogluon.tabular[all]==0.6.1->autogluon) (8.0.1)
Collecting commonmark<0.10.0,>=0.9.0
Downloading commonmark-0.9.1-py2.py3-none-any.whl (51 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 51.1/51.1 kB 167.6 MB/s eta 0:00:00
Requirement already satisfied: pygments<3.0.0,>=2.6.0 in /usr/local/lib/python3.7/site-packages (from rich->openmim<=0.2.1,>0.1.5->autogluon.multimodal==0.6.1->autogluon) (2.13.0)
Collecting importlib-metadata
Downloading importlib_metadata-5.2.0-py3-none-any.whl (21 kB)
Collecting distlib<1,>=0.3.6
Downloading distlib-0.3.6-py2.py3-none-any.whl (468 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 468.5/468.5 kB 250.9 MB/s eta 0:00:00
Collecting platformdirs<3,>=2.4
Downloading platformdirs-2.6.0-py3-none-any.whl (14 kB)
Collecting cachetools<6.0,>=2.0.0
Downloading cachetools-5.2.0-py3-none-any.whl (9.3 kB)
Collecting pyasn1-modules>=0.2.1
Downloading pyasn1_modules-0.2.8-py2.py3-none-any.whl (155 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 155.3/155.3 kB 202.7 MB/s eta 0:00:00
Requirement already satisfied: rsa<5,>=3.1.4 in /usr/local/lib/python3.7/site-packages (from google-auth<3,>=1.6.3->tensorboard>=2.9.1->pytorch-lightning<1.8.0,>=1.7.4->autogluon.multimodal==0.6.1->autogluon) (4.7.2)
Collecting requests-oauthlib>=0.7.0
Downloading requests_oauthlib-1.3.1-py2.py3-none-any.whl (23 kB)
Collecting confection<1.0.0,>=0.0.1
Downloading confection-0.0.3-py3-none-any.whl (32 kB)
Collecting blis<0.8.0,>=0.7.8
Downloading blis-0.7.9-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (10.2 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 10.2/10.2 MB 186.6 MB/s eta 0:00:00a 0:00:01
Collecting charset-normalizer<3.0,>=2.0
Downloading charset_normalizer-2.1.1-py3-none-any.whl (39 kB)
Collecting async-timeout<5.0,>=4.0.0a3
Downloading async_timeout-4.0.2-py3-none-any.whl (5.8 kB)
Collecting asynctest==0.13.0
Downloading asynctest-0.13.0-py3-none-any.whl (26 kB)
Collecting multidict<7.0,>=4.5
Downloading multidict-6.0.4-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (94 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 94.8/94.8 kB 199.1 MB/s eta 0:00:00
Collecting yarl<2.0,>=1.0
Downloading yarl-1.8.2-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (231 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 231.4/231.4 kB 129.9 MB/s eta 0:00:00
Requirement already satisfied: pyasn1<0.5.0,>=0.4.6 in /usr/local/lib/python3.7/site-packages (from pyasn1-modules>=0.2.1->google-auth<3,>=1.6.3->tensorboard>=2.9.1->pytorch-lightning<1.8.0,>=1.7.4->autogluon.multimodal==0.6.1->autogluon) (0.4.8)
Collecting oauthlib>=3.0.0
Downloading oauthlib-3.2.2-py3-none-any.whl (151 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 151.7/151.7 kB 219.0 MB/s eta 0:00:00
Building wheels for collected packages: fairscale, antlr4-python3-runtime, seqeval, future
Building wheel for fairscale (pyproject.toml) ... done
Created wheel for fairscale: filename=fairscale-0.4.6-py3-none-any.whl size=307224 sha256=9157cf5d2ff034f08cae88843ad0ab1dae668f4e938d02a1e2295873471fdef2
Stored in directory: /tmp/pip-ephem-wheel-cache-t85xghkj/wheels/0b/8c/fa/a9e102632bcb86e919561cf25ca1e0dd2ec67476f3a5544653
Building wheel for antlr4-python3-runtime (setup.py) ... done
Created wheel for antlr4-python3-runtime: filename=antlr4_python3_runtime-4.8-py3-none-any.whl size=141211 sha256=b7a63e66b025db1bd9fa16f79efb0b18f6bc3203427222d3f54b51c84c1af958
Stored in directory: /tmp/pip-ephem-wheel-cache-t85xghkj/wheels/c9/ef/75/1b8c6588a8a8a15d5a9136608a9d65172a226577e7ae89da31
Building wheel for seqeval (setup.py) ... done
Created wheel for seqeval: filename=seqeval-1.2.2-py3-none-any.whl size=16164 sha256=6de735f54f0ddad532177db0b730f222ca2c078f99490dbfb5d64386407c20c5
Stored in directory: /tmp/pip-ephem-wheel-cache-t85xghkj/wheels/b2/a1/b7/0d3b008d0c77cd57332d724b92cf7650b4185b493dc785f00a
Building wheel for future (setup.py) ... done
Created wheel for future: filename=future-0.18.2-py3-none-any.whl size=491059 sha256=fbb9e04886eda1ae16c530d695b7710ce16642aba4b9dc783942c331c6f2cf0c
Stored in directory: /tmp/pip-ephem-wheel-cache-t85xghkj/wheels/3e/3c/b4/7132d27620dd551cf00823f798a7190e7320ae7ffb71d1e989
Successfully built fairscale antlr4-python3-runtime seqeval future
Installing collected packages: wasabi, typish, tokenizers, text-unidecode, tensorboard-plugin-wit, sortedcontainers, sentencepiece, py4j, msgpack, heapdict, distlib, cymem, commonmark, antlr4-python3-runtime, zict, yacs, xxhash, wrapt, typing-extensions, tqdm, toolz, tensorboard-data-server, tblib, spacy-loggers, spacy-legacy, smart-open, regex, pyrsistent, pyDeprecate, pyasn1-modules, platformdirs, Pillow, ordered-set, omegaconf, oauthlib, numpy, murmurhash, multidict, locket, langcodes, importlib-resources, grpcio, future, frozenlist, filelock, fastprogress, defusedxml, charset-normalizer, cachetools, autocfg, asynctest, absl-py, yarl, torch, tifffile, tensorboardX, scipy, rich, responses, requests-oauthlib, PyWavelets, pydantic, preshed, patsy, partd, opencv-python-headless, nptyping, importlib-metadata, google-auth, fastcore, deprecated, catalogue, blis, async-timeout, aiosignal, xgboost, virtualenv, torchvision, torchtext, torchmetrics, statsmodels, srsly, scikit-image, nlpaug, markdown, jsonschema, hyperopt, huggingface-hub, google-auth-oauthlib, gluonts, fastdownload, fairscale, dask, click, aiohttp, accelerate, typer, transformers, timm, tensorboard, sktime, seqeval, ray, qudida, pytorch-metric-learning, pmdarima, nltk, model-index, lightgbm, gluoncv, distributed, confection, catboost, thinc, tbats, pytorch-lightning, pathy, openmim, datasets, autogluon.common, albumentations, spacy, evaluate, autogluon.features, autogluon.core, fastai, autogluon.tabular, autogluon.multimodal, autogluon.vision, autogluon.timeseries, autogluon.text, autogluon
Attempting uninstall: typing-extensions
Found existing installation: typing_extensions 4.0.1
Uninstalling typing_extensions-4.0.1:
Successfully uninstalled typing_extensions-4.0.1
Attempting uninstall: tqdm
Found existing installation: tqdm 4.39.0
Uninstalling tqdm-4.39.0:
Successfully uninstalled tqdm-4.39.0
Attempting uninstall: Pillow
Found existing installation: Pillow 8.4.0
Uninstalling Pillow-8.4.0:
Successfully uninstalled Pillow-8.4.0
Attempting uninstall: numpy
Found existing installation: numpy 1.19.1
Uninstalling numpy-1.19.1:
Successfully uninstalled numpy-1.19.1
Attempting uninstall: scipy
Found existing installation: scipy 1.4.1
Uninstalling scipy-1.4.1:
Successfully uninstalled scipy-1.4.1
Attempting uninstall: importlib-metadata
Found existing installation: importlib-metadata 4.8.2
Uninstalling importlib-metadata-4.8.2:
Successfully uninstalled importlib-metadata-4.8.2
Attempting uninstall: gluoncv
Found existing installation: gluoncv 0.8.0
Uninstalling gluoncv-0.8.0:
Successfully uninstalled gluoncv-0.8.0
Successfully installed Pillow-9.3.0 PyWavelets-1.3.0 absl-py-1.3.0 accelerate-0.13.2 aiohttp-3.8.3 aiosignal-1.3.1 albumentations-1.1.0 antlr4-python3-runtime-4.8 async-timeout-4.0.2 asynctest-0.13.0 autocfg-0.0.8 autogluon-0.6.1 autogluon.common-0.6.1 autogluon.core-0.6.1 autogluon.features-0.6.1 autogluon.multimodal-0.6.1 autogluon.tabular-0.6.1 autogluon.text-0.6.1 autogluon.timeseries-0.6.1 autogluon.vision-0.6.1 blis-0.7.9 cachetools-5.2.0 catalogue-2.0.8 catboost-1.1.1 charset-normalizer-2.1.1 click-8.0.4 commonmark-0.9.1 confection-0.0.3 cymem-2.0.7 dask-2021.11.2 datasets-2.8.0 defusedxml-0.7.1 deprecated-1.2.13 distlib-0.3.6 distributed-2021.11.2 evaluate-0.3.0 fairscale-0.4.6 fastai-2.7.10 fastcore-1.5.27 fastdownload-0.0.7 fastprogress-1.0.3 filelock-3.8.2 frozenlist-1.3.3 future-0.18.2 gluoncv-0.10.5.post0 gluonts-0.11.6 google-auth-2.15.0 google-auth-oauthlib-0.4.6 grpcio-1.43.0 heapdict-1.0.1 huggingface-hub-0.11.1 hyperopt-0.2.7 importlib-metadata-5.2.0 importlib-resources-5.10.1 jsonschema-4.8.0 langcodes-3.3.0 lightgbm-3.3.3 locket-1.0.0 markdown-3.4.1 model-index-0.1.11 msgpack-1.0.4 multidict-6.0.4 murmurhash-1.0.9 nlpaug-1.1.10 nltk-3.8 nptyping-1.4.4 numpy-1.21.6 oauthlib-3.2.2 omegaconf-2.1.2 opencv-python-headless-4.6.0.66 openmim-0.2.1 ordered-set-4.1.0 partd-1.3.0 pathy-0.10.1 patsy-0.5.3 platformdirs-2.6.0 pmdarima-1.8.5 preshed-3.0.8 py4j-0.10.9.7 pyDeprecate-0.3.2 pyasn1-modules-0.2.8 pydantic-1.10.2 pyrsistent-0.19.2 pytorch-lightning-1.7.7 pytorch-metric-learning-1.3.2 qudida-0.0.4 ray-2.0.1 regex-2022.10.31 requests-oauthlib-1.3.1 responses-0.18.0 rich-12.6.0 scikit-image-0.19.3 scipy-1.7.3 sentencepiece-0.1.97 seqeval-1.2.2 sktime-0.13.4 smart-open-5.2.1 sortedcontainers-2.4.0 spacy-3.4.4 spacy-legacy-3.0.10 spacy-loggers-1.0.4 srsly-2.4.5 statsmodels-0.13.5 tbats-1.1.2 tblib-1.7.0 tensorboard-2.11.0 tensorboard-data-server-0.6.1 tensorboard-plugin-wit-1.8.1 tensorboardX-2.5.1 text-unidecode-1.3 thinc-8.1.6 tifffile-2021.11.2 timm-0.6.12 tokenizers-0.13.2 toolz-0.12.0 torch-1.12.1 torchmetrics-0.8.2 torchtext-0.13.1 torchvision-0.13.1 tqdm-4.64.1 transformers-4.23.1 typer-0.7.0 typing-extensions-4.1.1 typish-1.9.3 virtualenv-20.17.1 wasabi-0.10.1 wrapt-1.14.1 xgboost-1.6.2 xxhash-3.1.0 yacs-0.1.8 yarl-1.8.2 zict-2.2.0
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
Collecting kaggle
Using cached kaggle-1.5.12-py3-none-any.whl
Collecting python-slugify
Using cached python_slugify-7.0.0-py2.py3-none-any.whl (9.4 kB)
Requirement already satisfied: tqdm in /usr/local/lib/python3.7/site-packages (from kaggle) (4.64.1)
Requirement already satisfied: certifi in /usr/local/lib/python3.7/site-packages (from kaggle) (2021.10.8)
Requirement already satisfied: python-dateutil in /usr/local/lib/python3.7/site-packages (from kaggle) (2.8.2)
Requirement already satisfied: requests in /usr/local/lib/python3.7/site-packages (from kaggle) (2.22.0)
Requirement already satisfied: six>=1.10 in /usr/local/lib/python3.7/site-packages (from kaggle) (1.16.0)
Requirement already satisfied: urllib3 in /usr/local/lib/python3.7/site-packages (from kaggle) (1.25.11)
Requirement already satisfied: text-unidecode>=1.3 in /usr/local/lib/python3.7/site-packages (from python-slugify->kaggle) (1.3)
Requirement already satisfied: idna<2.9,>=2.5 in /usr/local/lib/python3.7/site-packages (from requests->kaggle) (2.8)
Requirement already satisfied: chardet<3.1.0,>=3.0.2 in /usr/local/lib/python3.7/site-packages (from requests->kaggle) (3.0.4)
Installing collected packages: python-slugify, kaggle
Successfully installed kaggle-1.5.12 python-slugify-7.0.0
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
# create the .kaggle directory and an empty kaggle.json file
!mkdir -p /root/.kaggle
!touch /root/.kaggle/kaggle.json
!chmod 600 /root/.kaggle/kaggle.json
# !mkdir -p /Users/narmina/.kaggle
# !touch /Users/narmina/.kaggle/kaggle.json
# !chmod 600 /Users/narmina/.kaggle/kaggle.json
# Fill in your user name and key from creating the kaggle account and API token file
import json
kaggle_username = "nayayyc"
kaggle_key = "fa9bd198433174b4a6edb2f7620a6e62"
# Save API token the kaggle.json file
# with open("/Users/narmina/.kaggle/kaggle.json", "w") as f:
with open("/root/.kaggle/kaggle.json", "w") as f:
f.write(json.dumps({"username": kaggle_username, "key": kaggle_key}))
# Download the dataset, it will be in a .zip file so you'll need to unzip it as well.
!kaggle competitions download -c bike-sharing-demand
# If you already downloaded it you can use the -o command to overwrite the file
!unzip -o bike-sharing-demand.zip
Downloading bike-sharing-demand.zip to /root/aws_mle_nanodegree/project_1 0%| | 0.00/189k [00:00<?, ?B/s] 100%|████████████████████████████████████████| 189k/189k [00:00<00:00, 7.05MB/s] Archive: bike-sharing-demand.zip inflating: sampleSubmission.csv inflating: test.csv inflating: train.csv
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from autogluon.tabular import TabularPredictor
/usr/local/lib/python3.7/site-packages/tqdm/auto.py:22: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html from .autonotebook import tqdm as notebook_tqdm
# Create the train dataset in pandas by reading the csv
# Set the parsing of the datetime column so you can use some of the `dt` features in pandas later
train = pd.read_csv("train.csv")
train.head()
| datetime | season | holiday | workingday | weather | temp | atemp | humidity | windspeed | casual | registered | count | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2011-01-01 00:00:00 | 1 | 0 | 0 | 1 | 9.84 | 14.395 | 81 | 0.0 | 3 | 13 | 16 |
| 1 | 2011-01-01 01:00:00 | 1 | 0 | 0 | 1 | 9.02 | 13.635 | 80 | 0.0 | 8 | 32 | 40 |
| 2 | 2011-01-01 02:00:00 | 1 | 0 | 0 | 1 | 9.02 | 13.635 | 80 | 0.0 | 5 | 27 | 32 |
| 3 | 2011-01-01 03:00:00 | 1 | 0 | 0 | 1 | 9.84 | 14.395 | 75 | 0.0 | 3 | 10 | 13 |
| 4 | 2011-01-01 04:00:00 | 1 | 0 | 0 | 1 | 9.84 | 14.395 | 75 | 0.0 | 0 | 1 | 1 |
train.columns
Index(['datetime', 'season', 'holiday', 'workingday', 'weather', 'temp',
'atemp', 'humidity', 'windspeed', 'casual', 'registered', 'count'],
dtype='object')
# Simple output of the train dataset to view some of the min/max/varition of the dataset features.
train.describe()
| season | holiday | workingday | weather | temp | atemp | humidity | windspeed | casual | registered | count | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| count | 10886.000000 | 10886.000000 | 10886.000000 | 10886.000000 | 10886.00000 | 10886.000000 | 10886.000000 | 10886.000000 | 10886.000000 | 10886.000000 | 10886.000000 |
| mean | 2.506614 | 0.028569 | 0.680875 | 1.418427 | 20.23086 | 23.655084 | 61.886460 | 12.799395 | 36.021955 | 155.552177 | 191.574132 |
| std | 1.116174 | 0.166599 | 0.466159 | 0.633839 | 7.79159 | 8.474601 | 19.245033 | 8.164537 | 49.960477 | 151.039033 | 181.144454 |
| min | 1.000000 | 0.000000 | 0.000000 | 1.000000 | 0.82000 | 0.760000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1.000000 |
| 25% | 2.000000 | 0.000000 | 0.000000 | 1.000000 | 13.94000 | 16.665000 | 47.000000 | 7.001500 | 4.000000 | 36.000000 | 42.000000 |
| 50% | 3.000000 | 0.000000 | 1.000000 | 1.000000 | 20.50000 | 24.240000 | 62.000000 | 12.998000 | 17.000000 | 118.000000 | 145.000000 |
| 75% | 4.000000 | 0.000000 | 1.000000 | 2.000000 | 26.24000 | 31.060000 | 77.000000 | 16.997900 | 49.000000 | 222.000000 | 284.000000 |
| max | 4.000000 | 1.000000 | 1.000000 | 4.000000 | 41.00000 | 45.455000 | 100.000000 | 56.996900 | 367.000000 | 886.000000 | 977.000000 |
train.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 10886 entries, 0 to 10885 Data columns (total 12 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 datetime 10886 non-null object 1 season 10886 non-null int64 2 holiday 10886 non-null int64 3 workingday 10886 non-null int64 4 weather 10886 non-null int64 5 temp 10886 non-null float64 6 atemp 10886 non-null float64 7 humidity 10886 non-null int64 8 windspeed 10886 non-null float64 9 casual 10886 non-null int64 10 registered 10886 non-null int64 11 count 10886 non-null int64 dtypes: float64(3), int64(8), object(1) memory usage: 1020.7+ KB
# Create the test pandas dataframe in pandas by reading the csv, remember to parse the datetime!
test = pd.read_csv("test.csv")
test.head()
| datetime | season | holiday | workingday | weather | temp | atemp | humidity | windspeed | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 2011-01-20 00:00:00 | 1 | 0 | 1 | 1 | 10.66 | 11.365 | 56 | 26.0027 |
| 1 | 2011-01-20 01:00:00 | 1 | 0 | 1 | 1 | 10.66 | 13.635 | 56 | 0.0000 |
| 2 | 2011-01-20 02:00:00 | 1 | 0 | 1 | 1 | 10.66 | 13.635 | 56 | 0.0000 |
| 3 | 2011-01-20 03:00:00 | 1 | 0 | 1 | 1 | 10.66 | 12.880 | 56 | 11.0014 |
| 4 | 2011-01-20 04:00:00 | 1 | 0 | 1 | 1 | 10.66 | 12.880 | 56 | 11.0014 |
test.columns
Index(['datetime', 'season', 'holiday', 'workingday', 'weather', 'temp',
'atemp', 'humidity', 'windspeed'],
dtype='object')
test.describe()
| season | holiday | workingday | weather | temp | atemp | humidity | windspeed | |
|---|---|---|---|---|---|---|---|---|
| count | 6493.000000 | 6493.000000 | 6493.000000 | 6493.000000 | 6493.000000 | 6493.000000 | 6493.000000 | 6493.000000 |
| mean | 2.493300 | 0.029108 | 0.685815 | 1.436778 | 20.620607 | 24.012865 | 64.125212 | 12.631157 |
| std | 1.091258 | 0.168123 | 0.464226 | 0.648390 | 8.059583 | 8.782741 | 19.293391 | 8.250151 |
| min | 1.000000 | 0.000000 | 0.000000 | 1.000000 | 0.820000 | 0.000000 | 16.000000 | 0.000000 |
| 25% | 2.000000 | 0.000000 | 0.000000 | 1.000000 | 13.940000 | 16.665000 | 49.000000 | 7.001500 |
| 50% | 3.000000 | 0.000000 | 1.000000 | 1.000000 | 21.320000 | 25.000000 | 65.000000 | 11.001400 |
| 75% | 3.000000 | 0.000000 | 1.000000 | 2.000000 | 27.060000 | 31.060000 | 81.000000 | 16.997900 |
| max | 4.000000 | 1.000000 | 1.000000 | 4.000000 | 40.180000 | 50.000000 | 100.000000 | 55.998600 |
# Same thing as train and test dataset
submission = pd.read_csv("sampleSubmission.csv")
submission.head()
| datetime | count | |
|---|---|---|
| 0 | 2011-01-20 00:00:00 | 0 |
| 1 | 2011-01-20 01:00:00 | 0 |
| 2 | 2011-01-20 02:00:00 | 0 |
| 3 | 2011-01-20 03:00:00 | 0 |
| 4 | 2011-01-20 04:00:00 | 0 |
submission.describe()
| count | |
|---|---|
| count | 6493.0 |
| mean | 0.0 |
| std | 0.0 |
| min | 0.0 |
| 25% | 0.0 |
| 50% | 0.0 |
| 75% | 0.0 |
| max | 0.0 |
Requirements:
count, so it is the label we are setting.casual and registered columns as they are also not present in the test dataset. root_mean_squared_error as the metric to use for evaluation.best_quality to focus on creating the best model.# Stack configuration (auto_stack=True): num_stack_levels=1, num_bag_folds=8, num_bag_sets=20
## I will be dropping these two columns as they are not present in test dataset instead of ignoring columns.
train.drop(columns=['casual','registered'], inplace = True )
predictor = TabularPredictor(
label='count',
eval_metric = 'root_mean_squared_error',
).fit(
train_data = train,
# ignored_columns = ['casual','registered'],
time_limit = 600,
presets='best_quality'
)
No path specified. Models will be saved in: "AutogluonModels/ag-20221226_115948/"
Presets specified: ['best_quality']
Stack configuration (auto_stack=True): num_stack_levels=1, num_bag_folds=8, num_bag_sets=20
Beginning AutoGluon training ... Time limit = 600s
AutoGluon will save models to "AutogluonModels/ag-20221226_115948/"
AutoGluon Version: 0.6.1
Python Version: 3.7.10
Operating System: Linux
Platform Machine: x86_64
Platform Version: #1 SMP Wed Oct 26 20:36:53 UTC 2022
Train Data Rows: 10886
Train Data Columns: 9
Label Column: count
Preprocessing data ...
AutoGluon infers your prediction problem is: 'regression' (because dtype of label-column == int and many unique label-values observed).
Label info (max, min, mean, stddev): (977, 1, 191.57413, 181.14445)
If 'regression' is not the correct problem_type, please manually specify the problem_type parameter during predictor init (You may specify problem_type as one of: ['binary', 'multiclass', 'regression'])
Using Feature Generators to preprocess the data ...
Fitting AutoMLPipelineFeatureGenerator...
Available Memory: 6923.01 MB
Train Data (Original) Memory Usage: 1.52 MB (0.0% of available memory)
Inferring data type of each feature based on column values. Set feature_metadata_in to manually specify special dtypes of the features.
Stage 1 Generators:
Fitting AsTypeFeatureGenerator...
Note: Converting 2 features to boolean dtype as they only contain 2 unique values.
Stage 2 Generators:
Fitting FillNaFeatureGenerator...
Stage 3 Generators:
Fitting IdentityFeatureGenerator...
Fitting DatetimeFeatureGenerator...
/usr/local/lib/python3.7/site-packages/autogluon/features/generators/datetime.py:59: FutureWarning: casting datetime64[ns, UTC] values to int64 with .astype(...) is deprecated and will raise in a future version. Use .view(...) instead.
good_rows = series[~series.isin(bad_rows)].astype(np.int64)
Stage 4 Generators:
Fitting DropUniqueFeatureGenerator...
Types of features in original data (raw dtype, special dtypes):
('float', []) : 3 | ['temp', 'atemp', 'windspeed']
('int', []) : 5 | ['season', 'holiday', 'workingday', 'weather', 'humidity']
('object', ['datetime_as_object']) : 1 | ['datetime']
Types of features in processed data (raw dtype, special dtypes):
('float', []) : 3 | ['temp', 'atemp', 'windspeed']
('int', []) : 3 | ['season', 'weather', 'humidity']
('int', ['bool']) : 2 | ['holiday', 'workingday']
('int', ['datetime_as_int']) : 5 | ['datetime', 'datetime.year', 'datetime.month', 'datetime.day', 'datetime.dayofweek']
0.3s = Fit runtime
9 features in original data used to generate 13 features in processed data.
Train Data (Processed) Memory Usage: 0.98 MB (0.0% of available memory)
Data preprocessing and feature engineering runtime = 0.38s ...
AutoGluon will gauge predictive performance using evaluation metric: 'root_mean_squared_error'
This metric's sign has been flipped to adhere to being higher_is_better. The metric score can be multiplied by -1 to get the metric value.
To change this, specify the eval_metric parameter of Predictor()
AutoGluon will fit 2 stack levels (L1 to L2) ...
Fitting 11 L1 models ...
Fitting model: KNeighborsUnif_BAG_L1 ... Training model for up to 399.64s of the 599.61s of remaining time.
-101.5462 = Validation score (-root_mean_squared_error)
0.03s = Training runtime
0.1s = Validation runtime
Fitting model: KNeighborsDist_BAG_L1 ... Training model for up to 396.45s of the 596.42s of remaining time.
-84.1251 = Validation score (-root_mean_squared_error)
0.03s = Training runtime
0.1s = Validation runtime
Fitting model: LightGBMXT_BAG_L1 ... Training model for up to 396.08s of the 596.05s of remaining time.
Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy
-131.4609 = Validation score (-root_mean_squared_error)
65.35s = Training runtime
6.55s = Validation runtime
Fitting model: LightGBM_BAG_L1 ... Training model for up to 319.99s of the 519.96s of remaining time.
Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy
-131.0542 = Validation score (-root_mean_squared_error)
30.42s = Training runtime
1.47s = Validation runtime
Fitting model: RandomForestMSE_BAG_L1 ... Training model for up to 284.8s of the 484.77s of remaining time.
-116.5443 = Validation score (-root_mean_squared_error)
10.91s = Training runtime
0.57s = Validation runtime
Fitting model: CatBoost_BAG_L1 ... Training model for up to 270.57s of the 470.53s of remaining time.
Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy
-130.5332 = Validation score (-root_mean_squared_error)
201.81s = Training runtime
0.18s = Validation runtime
Fitting model: ExtraTreesMSE_BAG_L1 ... Training model for up to 64.79s of the 264.76s of remaining time.
-124.5881 = Validation score (-root_mean_squared_error)
5.16s = Training runtime
0.55s = Validation runtime
Fitting model: NeuralNetFastAI_BAG_L1 ... Training model for up to 56.2s of the 256.16s of remaining time.
Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy
-138.3722 = Validation score (-root_mean_squared_error)
71.59s = Training runtime
0.41s = Validation runtime
Completed 1/20 k-fold bagging repeats ...
Fitting model: WeightedEnsemble_L2 ... Training model for up to 360.0s of the 179.16s of remaining time.
-84.1251 = Validation score (-root_mean_squared_error)
0.53s = Training runtime
0.0s = Validation runtime
Fitting 9 L2 models ...
Fitting model: LightGBMXT_BAG_L2 ... Training model for up to 178.55s of the 178.52s of remaining time.
Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy
-60.3946 = Validation score (-root_mean_squared_error)
55.85s = Training runtime
3.75s = Validation runtime
Fitting model: LightGBM_BAG_L2 ... Training model for up to 117.33s of the 117.3s of remaining time.
Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy
-55.2179 = Validation score (-root_mean_squared_error)
25.77s = Training runtime
0.22s = Validation runtime
Fitting model: RandomForestMSE_BAG_L2 ... Training model for up to 86.96s of the 86.94s of remaining time.
-53.4065 = Validation score (-root_mean_squared_error)
26.52s = Training runtime
0.64s = Validation runtime
Fitting model: CatBoost_BAG_L2 ... Training model for up to 57.26s of the 57.23s of remaining time.
Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy
-55.7444 = Validation score (-root_mean_squared_error)
59.33s = Training runtime
0.06s = Validation runtime
Completed 1/20 k-fold bagging repeats ...
Fitting model: WeightedEnsemble_L3 ... Training model for up to 360.0s of the -6.3s of remaining time.
-53.1096 = Validation score (-root_mean_squared_error)
0.39s = Training runtime
0.0s = Validation runtime
AutoGluon training complete, total runtime = 606.92s ... Best model: "WeightedEnsemble_L3"
TabularPredictor saved. To load, use: predictor = TabularPredictor.load("AutogluonModels/ag-20221226_115948/")
predictor.fit_summary()
*** Summary of fit() ***
Estimated performance of each model:
model score_val pred_time_val fit_time pred_time_val_marginal fit_time_marginal stack_level can_infer fit_order
0 WeightedEnsemble_L3 -53.109600 14.614315 553.148430 0.001307 0.387465 3 True 14
1 RandomForestMSE_BAG_L2 -53.406479 10.585600 411.811534 0.637849 26.515250 2 True 12
2 LightGBM_BAG_L2 -55.217867 10.165047 411.070628 0.217296 25.774344 2 True 11
3 CatBoost_BAG_L2 -55.744445 10.012650 444.625488 0.064899 59.329205 2 True 13
4 LightGBMXT_BAG_L2 -60.394630 13.692964 441.142166 3.745213 55.845882 2 True 10
5 KNeighborsDist_BAG_L1 -84.125061 0.103698 0.029283 0.103698 0.029283 1 True 2
6 WeightedEnsemble_L2 -84.125061 0.104856 0.556873 0.001159 0.527590 2 True 9
7 KNeighborsUnif_BAG_L1 -101.546199 0.103664 0.031033 0.103664 0.031033 1 True 1
8 RandomForestMSE_BAG_L1 -116.544294 0.569369 10.905250 0.569369 10.905250 1 True 5
9 ExtraTreesMSE_BAG_L1 -124.588053 0.554608 5.158636 0.554608 5.158636 1 True 7
10 CatBoost_BAG_L1 -130.533194 0.177396 201.808275 0.177396 201.808275 1 True 6
11 LightGBM_BAG_L1 -131.054162 1.472807 30.420964 1.472807 30.420964 1 True 4
12 LightGBMXT_BAG_L1 -131.460909 6.551322 65.351942 6.551322 65.351942 1 True 3
13 NeuralNetFastAI_BAG_L1 -138.372209 0.414887 71.590901 0.414887 71.590901 1 True 8
Number of models trained: 14
Types of models trained:
{'StackerEnsembleModel_NNFastAiTabular', 'WeightedEnsembleModel', 'StackerEnsembleModel_LGB', 'StackerEnsembleModel_KNN', 'StackerEnsembleModel_XT', 'StackerEnsembleModel_RF', 'StackerEnsembleModel_CatBoost'}
Bagging used: True (with 8 folds)
Multi-layer stack-ensembling used: True (with 3 levels)
Feature Metadata (Processed):
(raw dtype, special dtypes):
('float', []) : 3 | ['temp', 'atemp', 'windspeed']
('int', []) : 3 | ['season', 'weather', 'humidity']
('int', ['bool']) : 2 | ['holiday', 'workingday']
('int', ['datetime_as_int']) : 5 | ['datetime', 'datetime.year', 'datetime.month', 'datetime.day', 'datetime.dayofweek']
Plot summary of models saved to file: AutogluonModels/ag-20221226_115948/SummaryOfModels.html
*** End of fit() summary ***
{'model_types': {'KNeighborsUnif_BAG_L1': 'StackerEnsembleModel_KNN',
'KNeighborsDist_BAG_L1': 'StackerEnsembleModel_KNN',
'LightGBMXT_BAG_L1': 'StackerEnsembleModel_LGB',
'LightGBM_BAG_L1': 'StackerEnsembleModel_LGB',
'RandomForestMSE_BAG_L1': 'StackerEnsembleModel_RF',
'CatBoost_BAG_L1': 'StackerEnsembleModel_CatBoost',
'ExtraTreesMSE_BAG_L1': 'StackerEnsembleModel_XT',
'NeuralNetFastAI_BAG_L1': 'StackerEnsembleModel_NNFastAiTabular',
'WeightedEnsemble_L2': 'WeightedEnsembleModel',
'LightGBMXT_BAG_L2': 'StackerEnsembleModel_LGB',
'LightGBM_BAG_L2': 'StackerEnsembleModel_LGB',
'RandomForestMSE_BAG_L2': 'StackerEnsembleModel_RF',
'CatBoost_BAG_L2': 'StackerEnsembleModel_CatBoost',
'WeightedEnsemble_L3': 'WeightedEnsembleModel'},
'model_performance': {'KNeighborsUnif_BAG_L1': -101.54619908446061,
'KNeighborsDist_BAG_L1': -84.12506123181602,
'LightGBMXT_BAG_L1': -131.46090891834504,
'LightGBM_BAG_L1': -131.054161598899,
'RandomForestMSE_BAG_L1': -116.54429428704391,
'CatBoost_BAG_L1': -130.5331939673838,
'ExtraTreesMSE_BAG_L1': -124.58805258915959,
'NeuralNetFastAI_BAG_L1': -138.37220877327402,
'WeightedEnsemble_L2': -84.12506123181602,
'LightGBMXT_BAG_L2': -60.394630458831784,
'LightGBM_BAG_L2': -55.21786685203879,
'RandomForestMSE_BAG_L2': -53.40647918962767,
'CatBoost_BAG_L2': -55.74444485320961,
'WeightedEnsemble_L3': -53.109600407057876},
'model_best': 'WeightedEnsemble_L3',
'model_paths': {'KNeighborsUnif_BAG_L1': 'AutogluonModels/ag-20221226_115948/models/KNeighborsUnif_BAG_L1/',
'KNeighborsDist_BAG_L1': 'AutogluonModels/ag-20221226_115948/models/KNeighborsDist_BAG_L1/',
'LightGBMXT_BAG_L1': 'AutogluonModels/ag-20221226_115948/models/LightGBMXT_BAG_L1/',
'LightGBM_BAG_L1': 'AutogluonModels/ag-20221226_115948/models/LightGBM_BAG_L1/',
'RandomForestMSE_BAG_L1': 'AutogluonModels/ag-20221226_115948/models/RandomForestMSE_BAG_L1/',
'CatBoost_BAG_L1': 'AutogluonModels/ag-20221226_115948/models/CatBoost_BAG_L1/',
'ExtraTreesMSE_BAG_L1': 'AutogluonModels/ag-20221226_115948/models/ExtraTreesMSE_BAG_L1/',
'NeuralNetFastAI_BAG_L1': 'AutogluonModels/ag-20221226_115948/models/NeuralNetFastAI_BAG_L1/',
'WeightedEnsemble_L2': 'AutogluonModels/ag-20221226_115948/models/WeightedEnsemble_L2/',
'LightGBMXT_BAG_L2': 'AutogluonModels/ag-20221226_115948/models/LightGBMXT_BAG_L2/',
'LightGBM_BAG_L2': 'AutogluonModels/ag-20221226_115948/models/LightGBM_BAG_L2/',
'RandomForestMSE_BAG_L2': 'AutogluonModels/ag-20221226_115948/models/RandomForestMSE_BAG_L2/',
'CatBoost_BAG_L2': 'AutogluonModels/ag-20221226_115948/models/CatBoost_BAG_L2/',
'WeightedEnsemble_L3': 'AutogluonModels/ag-20221226_115948/models/WeightedEnsemble_L3/'},
'model_fit_times': {'KNeighborsUnif_BAG_L1': 0.031032800674438477,
'KNeighborsDist_BAG_L1': 0.02928328514099121,
'LightGBMXT_BAG_L1': 65.35194182395935,
'LightGBM_BAG_L1': 30.420964002609253,
'RandomForestMSE_BAG_L1': 10.90524959564209,
'CatBoost_BAG_L1': 201.8082754611969,
'ExtraTreesMSE_BAG_L1': 5.15863561630249,
'NeuralNetFastAI_BAG_L1': 71.59090113639832,
'WeightedEnsemble_L2': 0.5275900363922119,
'LightGBMXT_BAG_L2': 55.845882415771484,
'LightGBM_BAG_L2': 25.774344205856323,
'RandomForestMSE_BAG_L2': 26.515249967575073,
'CatBoost_BAG_L2': 59.32920455932617,
'WeightedEnsemble_L3': 0.3874647617340088},
'model_pred_times': {'KNeighborsUnif_BAG_L1': 0.10366415977478027,
'KNeighborsDist_BAG_L1': 0.1036977767944336,
'LightGBMXT_BAG_L1': 6.551321506500244,
'LightGBM_BAG_L1': 1.4728071689605713,
'RandomForestMSE_BAG_L1': 0.5693690776824951,
'CatBoost_BAG_L1': 0.17739629745483398,
'ExtraTreesMSE_BAG_L1': 0.5546078681945801,
'NeuralNetFastAI_BAG_L1': 0.4148869514465332,
'WeightedEnsemble_L2': 0.0011587142944335938,
'LightGBMXT_BAG_L2': 3.7452127933502197,
'LightGBM_BAG_L2': 0.21729588508605957,
'RandomForestMSE_BAG_L2': 0.6378493309020996,
'CatBoost_BAG_L2': 0.06489896774291992,
'WeightedEnsemble_L3': 0.001306772232055664},
'num_bag_folds': 8,
'max_stack_level': 3,
'model_hyperparams': {'KNeighborsUnif_BAG_L1': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True,
'use_child_oof': True},
'KNeighborsDist_BAG_L1': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True,
'use_child_oof': True},
'LightGBMXT_BAG_L1': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'LightGBM_BAG_L1': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'RandomForestMSE_BAG_L1': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True,
'use_child_oof': True},
'CatBoost_BAG_L1': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'ExtraTreesMSE_BAG_L1': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True,
'use_child_oof': True},
'NeuralNetFastAI_BAG_L1': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'WeightedEnsemble_L2': {'use_orig_features': False,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'LightGBMXT_BAG_L2': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'LightGBM_BAG_L2': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'RandomForestMSE_BAG_L2': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True,
'use_child_oof': True},
'CatBoost_BAG_L2': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'WeightedEnsemble_L3': {'use_orig_features': False,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True}},
'leaderboard': model score_val pred_time_val fit_time \
0 WeightedEnsemble_L3 -53.109600 14.614315 553.148430
1 RandomForestMSE_BAG_L2 -53.406479 10.585600 411.811534
2 LightGBM_BAG_L2 -55.217867 10.165047 411.070628
3 CatBoost_BAG_L2 -55.744445 10.012650 444.625488
4 LightGBMXT_BAG_L2 -60.394630 13.692964 441.142166
5 KNeighborsDist_BAG_L1 -84.125061 0.103698 0.029283
6 WeightedEnsemble_L2 -84.125061 0.104856 0.556873
7 KNeighborsUnif_BAG_L1 -101.546199 0.103664 0.031033
8 RandomForestMSE_BAG_L1 -116.544294 0.569369 10.905250
9 ExtraTreesMSE_BAG_L1 -124.588053 0.554608 5.158636
10 CatBoost_BAG_L1 -130.533194 0.177396 201.808275
11 LightGBM_BAG_L1 -131.054162 1.472807 30.420964
12 LightGBMXT_BAG_L1 -131.460909 6.551322 65.351942
13 NeuralNetFastAI_BAG_L1 -138.372209 0.414887 71.590901
pred_time_val_marginal fit_time_marginal stack_level can_infer \
0 0.001307 0.387465 3 True
1 0.637849 26.515250 2 True
2 0.217296 25.774344 2 True
3 0.064899 59.329205 2 True
4 3.745213 55.845882 2 True
5 0.103698 0.029283 1 True
6 0.001159 0.527590 2 True
7 0.103664 0.031033 1 True
8 0.569369 10.905250 1 True
9 0.554608 5.158636 1 True
10 0.177396 201.808275 1 True
11 1.472807 30.420964 1 True
12 6.551322 65.351942 1 True
13 0.414887 71.590901 1 True
fit_order
0 14
1 12
2 11
3 13
4 10
5 2
6 9
7 1
8 5
9 7
10 6
11 4
12 3
13 8 }
predictor.leaderboard(silent=True)
| model | score_val | pred_time_val | fit_time | pred_time_val_marginal | fit_time_marginal | stack_level | can_infer | fit_order | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | WeightedEnsemble_L3 | -53.109600 | 14.614315 | 553.148430 | 0.001307 | 0.387465 | 3 | True | 14 |
| 1 | RandomForestMSE_BAG_L2 | -53.406479 | 10.585600 | 411.811534 | 0.637849 | 26.515250 | 2 | True | 12 |
| 2 | LightGBM_BAG_L2 | -55.217867 | 10.165047 | 411.070628 | 0.217296 | 25.774344 | 2 | True | 11 |
| 3 | CatBoost_BAG_L2 | -55.744445 | 10.012650 | 444.625488 | 0.064899 | 59.329205 | 2 | True | 13 |
| 4 | LightGBMXT_BAG_L2 | -60.394630 | 13.692964 | 441.142166 | 3.745213 | 55.845882 | 2 | True | 10 |
| 5 | KNeighborsDist_BAG_L1 | -84.125061 | 0.103698 | 0.029283 | 0.103698 | 0.029283 | 1 | True | 2 |
| 6 | WeightedEnsemble_L2 | -84.125061 | 0.104856 | 0.556873 | 0.001159 | 0.527590 | 2 | True | 9 |
| 7 | KNeighborsUnif_BAG_L1 | -101.546199 | 0.103664 | 0.031033 | 0.103664 | 0.031033 | 1 | True | 1 |
| 8 | RandomForestMSE_BAG_L1 | -116.544294 | 0.569369 | 10.905250 | 0.569369 | 10.905250 | 1 | True | 5 |
| 9 | ExtraTreesMSE_BAG_L1 | -124.588053 | 0.554608 | 5.158636 | 0.554608 | 5.158636 | 1 | True | 7 |
| 10 | CatBoost_BAG_L1 | -130.533194 | 0.177396 | 201.808275 | 0.177396 | 201.808275 | 1 | True | 6 |
| 11 | LightGBM_BAG_L1 | -131.054162 | 1.472807 | 30.420964 | 1.472807 | 30.420964 | 1 | True | 4 |
| 12 | LightGBMXT_BAG_L1 | -131.460909 | 6.551322 | 65.351942 | 6.551322 | 65.351942 | 1 | True | 3 |
| 13 | NeuralNetFastAI_BAG_L1 | -138.372209 | 0.414887 | 71.590901 | 0.414887 | 71.590901 | 1 | True | 8 |
fig = predictor.leaderboard(silent=True).plot(kind="bar", x="model", y="score_val").figure
fig.tight_layout()
fig.savefig('img/exp_1_leaderboard.png')
predictions = predictor.predict(test)
predictions.head()
0 23.355633 1 41.986988 2 45.374504 3 49.315769 4 52.064514 Name: count, dtype: float32
# Describe the `predictions` series to see if there are any negative values
predictions.describe()
count 6493.000000 mean 100.940247 std 89.856956 min 3.016292 25% 20.067400 50% 64.116325 75% 167.614639 max 365.451843 Name: count, dtype: float64
# How many negative values do we have?
(predictions<0).sum()
0
# All values are non-negative - nothing to worry about
submission["count"] = predictions
submission.to_csv("submission.csv", index=False)
!kaggle competitions submit -c bike-sharing-demand -f submission.csv -m "initial submission 1"
100%|█████████████████████████████████████████| 188k/188k [00:00<00:00, 316kB/s] Successfully submitted to Bike Sharing Demand
My Submissions¶!kaggle competitions submissions -c bike-sharing-demand | tail -n +1 | head -n 6
fileName date description status publicScore privateScore
------------------------------ ------------------- ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- -------- ----------- ------------
submission.csv 2022-12-26 12:10:34 initial submission 1 complete 1.79067 1.79067
submission_new_hpo_3f.csv 2022-12-25 20:15:51 hp tuning 3f complete 0.50219 0.50219
submission_new_hpo_3e.csv 2022-12-25 19:59:07 hp tuning 3e complete 0.53176 0.53176
submission_new_hpo_3c.csv 2022-12-25 19:35:06 hpo 3c num_bag_sets = 5 complete 0.63215 0.63215
tail: error writing 'standard output': Broken pipe
Traceback (most recent call last):
File "/usr/local/bin/kaggle", line 8, in <module>
sys.exit(main())
File "/usr/local/lib/python3.7/site-packages/kaggle/cli.py", line 67, in main
out = args.func(**command_args)
File "/usr/local/lib/python3.7/site-packages/kaggle/api/kaggle_api_extended.py", line 618, in competition_submissions_cli
self.print_table(submissions, fields)
File "/usr/local/lib/python3.7/site-packages/kaggle/api/kaggle_api_extended.py", line 2253, in print_table
print(row_format.format(*i_fields))
BrokenPipeError: [Errno 32] Broken pipe
# Create a histogram of all features to show the distribution of each one relative to the data. This is part of the exploratory data analysis
train.hist(figsize = (20,10))
plt.show()
## Let's visualize correlations using heatmap
plt.figure(figsize=(10,8))
sns.heatmap(train.corr())
plt.show()
## Let's view pairwise scatterplots with weather facet
sns.pairplot(train, hue="weather")
plt.show()
Note: Weather description 1: Clear, Few clouds, Partly cloudy, Partly cloudy 2: Mist + Cloudy, Mist + Broken clouds, Mist + Few clouds, Mist 3: Light Snow, Light Rain + Thunderstorm + Scattered clouds, Light Rain + Scattered clouds 4: Heavy Rain + Ice Pallets + Thunderstorm + Mist, Snow + Fog
plt.figure(figsize=(10,5))
plt.plot(train['datetime'],train['count'].ewm(span = 24).mean())
plt.title('Bike Sharing Demand over 2011-2022.')
plt.xlabel('Timeframe')
plt.ylabel('Hourly bike count')
start_number=0
end_number = len(train['datetime'])
step_number = 24*30
plt.xticks(range(start_number,end_number,step_number),rotation=90)
plt.show()
## Observation: there is a general growth trend (2011 vs 2012)
## Let's look at week timeframe
filt = (train['datetime']>='2011-01-03') & (train['datetime']<='2011-01-10')
plt.figure(figsize=(8,4))
plt.plot(train[filt]['datetime'],train[filt]['count'])
plt.title('Bike Sharing Demand from Jan 3 2011 til Jan 10 2011.')
plt.xlabel('Timeframe')
plt.ylabel('Hourly bike count')
start_number=0
end_number = len(train[filt]['datetime'])
step_number = 10
plt.grid(alpha=0.3)
plt.xticks(range(start_number,end_number,step_number),rotation=90)
plt.show()
## observations: high hourly seasonality, low demand at weekends
## Let's look at day timeframe
plt.figure(figsize=(8,4))
filt = (train['datetime']>='2011-01-04') & (train['datetime']<'2011-01-05')
plt.plot(train[filt]['datetime'],train[filt]['count'])
plt.title('Bike Sharing Demand for Jan 4 2011.')
plt.xlabel('Timeframe')
plt.ylabel('Hourly bike count')
plt.grid(alpha=0.3)
plt.xticks(rotation=90)
plt.show()
## Let's look at day timeframe
plt.figure(figsize=(8,4))
filt = (train['datetime']>='2012-01-05') & (train['datetime']<'2012-01-06')
plt.plot(train[filt]['datetime'],train[filt]['count'])
plt.title('Bike Sharing Demand for Jan 5 2012.')
plt.xlabel('Timeframe')
plt.ylabel('Hourly bike count')
plt.grid(alpha=0.3)
plt.xticks(rotation=90)
plt.show()
## Observation: there ara 3 spikes in demand observed across morning (7am - 9am), lunch (11am - 1pm), and evening (4 - 7pm). On the other hand, demand tends to fall to its lowest levels starting from 11PM till 6AM.
# create a new feature
train['datetime'] = pd.to_datetime(train['datetime'])
train['datetime_hour'] = train['datetime'].dt.hour
train['datetime_day'] = train['datetime'].dt.day
train['datetime_week'] = train['datetime'].dt.week
train['datetime_month'] = train['datetime'].dt.month
train['datetime_year'] = train['datetime'].dt.year
train['datetime_dayofweek'] = train['datetime'].dt.dayofweek
test['datetime'] = pd.to_datetime(test['datetime'])
test['datetime_hour'] = test['datetime'].dt.hour
test['datetime_day'] = test['datetime'].dt.day
test['datetime_week'] = test['datetime'].dt.week
test['datetime_month'] = test['datetime'].dt.month
test['datetime_year'] = test['datetime'].dt.year
test['datetime_dayofweek'] = test['datetime'].dt.dayofweek
/usr/local/lib/python3.7/site-packages/ipykernel_launcher.py:6: FutureWarning: Series.dt.weekofyear and Series.dt.week have been deprecated. Please use Series.dt.isocalendar().week instead. /usr/local/lib/python3.7/site-packages/ipykernel_launcher.py:14: FutureWarning: Series.dt.weekofyear and Series.dt.week have been deprecated. Please use Series.dt.isocalendar().week instead.
## adding extra hour categories: morning, lunch, evening, none
def extract_hour_category(h):
if h in [7,8,9]:
return 1
elif h in [11,12,13]:
return 2
elif h in [17,18,19]:
return 3
elif h in [23,0,1,2,3,4,5]:
return 4
else:
return 0
train['hour_category'] = train['datetime_hour'].apply(lambda x: extract_hour_category(x))
test['hour_category'] = test['datetime_hour'].apply(lambda x: extract_hour_category(x))
train["season"] = train["season"].astype("category")
train["weather"] = train["weather"].astype("category")
test["season"] = test["season"].astype("category")
test["weather"] = test["weather"].astype("category")
train['hour_category'] = train["hour_category"].astype("category")
test['hour_category'] = test["hour_category"].astype("category")
# View are new feature
train.head()
| datetime | season | holiday | workingday | weather | temp | atemp | humidity | windspeed | count | datetime_hour | datetime_day | datetime_month | datetime_year | datetime_dayofweek | datetime_week | hour_category | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2011-01-01 00:00:00 | 1 | 0 | 0 | 1 | 9.84 | 14.395 | 81 | 0.0 | 16 | 0 | 1 | 1 | 2011 | 5 | 52 | 4 |
| 1 | 2011-01-01 01:00:00 | 1 | 0 | 0 | 1 | 9.02 | 13.635 | 80 | 0.0 | 40 | 1 | 1 | 1 | 2011 | 5 | 52 | 4 |
| 2 | 2011-01-01 02:00:00 | 1 | 0 | 0 | 1 | 9.02 | 13.635 | 80 | 0.0 | 32 | 2 | 1 | 1 | 2011 | 5 | 52 | 4 |
| 3 | 2011-01-01 03:00:00 | 1 | 0 | 0 | 1 | 9.84 | 14.395 | 75 | 0.0 | 13 | 3 | 1 | 1 | 2011 | 5 | 52 | 4 |
| 4 | 2011-01-01 04:00:00 | 1 | 0 | 0 | 1 | 9.84 | 14.395 | 75 | 0.0 | 1 | 4 | 1 | 1 | 2011 | 5 | 52 | 4 |
# View histogram of all features again now with the hour feature
import matplotlib.pyplot as plt
train.hist(figsize = (20,12))
plt.show()
sns.pairplot(train, hue="hour_category")
<seaborn.axisgrid.PairGrid at 0x7f5290cdbd90>
filt = (train['datetime_year']==2012) #& (train['datetime_month']==4)
plt.bar(train[filt]['datetime_hour'],train[filt]['count'])
plt.axhline(train[filt]['count'].quantile(0.75),c='black',linestyle='--')
plt.axvline(7, c='r', linestyle='--')
plt.axvline(9, c='r', linestyle='--')
plt.axvline(11, c='r', linestyle='--')
plt.axvline(13, c='r', linestyle='--')
plt.axvline(16, c='r', linestyle='--')
plt.axvline(19, c='r', linestyle='--')
<matplotlib.lines.Line2D at 0x7f528ad90550>
train['count'].describe()
count 10886.000000 mean 191.574132 std 181.144454 min 1.000000 25% 42.000000 50% 145.000000 75% 284.000000 max 977.000000 Name: count, dtype: float64
predictor_new_features_2a = TabularPredictor(
label = 'count',
eval_metric = 'root_mean_squared_error',
).fit(
train_data = train,
time_limit = 600,
presets='best_quality'
)
No path specified. Models will be saved in: "AutogluonModels/ag-20221226_142547/"
Presets specified: ['best_quality']
Stack configuration (auto_stack=True): num_stack_levels=1, num_bag_folds=8, num_bag_sets=20
Beginning AutoGluon training ... Time limit = 600s
AutoGluon will save models to "AutogluonModels/ag-20221226_142547/"
AutoGluon Version: 0.6.1
Python Version: 3.7.10
Operating System: Linux
Platform Machine: x86_64
Platform Version: #1 SMP Wed Oct 26 20:36:53 UTC 2022
Train Data Rows: 10886
Train Data Columns: 16
Label Column: count
Preprocessing data ...
AutoGluon infers your prediction problem is: 'regression' (because dtype of label-column == int and many unique label-values observed).
Label info (max, min, mean, stddev): (977, 1, 191.57413, 181.14445)
If 'regression' is not the correct problem_type, please manually specify the problem_type parameter during predictor init (You may specify problem_type as one of: ['binary', 'multiclass', 'regression'])
Using Feature Generators to preprocess the data ...
Fitting AutoMLPipelineFeatureGenerator...
Available Memory: 5327.31 MB
Train Data (Original) Memory Usage: 1.17 MB (0.0% of available memory)
Inferring data type of each feature based on column values. Set feature_metadata_in to manually specify special dtypes of the features.
Stage 1 Generators:
Fitting AsTypeFeatureGenerator...
Note: Converting 3 features to boolean dtype as they only contain 2 unique values.
Stage 2 Generators:
Fitting FillNaFeatureGenerator...
Stage 3 Generators:
Fitting IdentityFeatureGenerator...
Fitting CategoryFeatureGenerator...
Fitting CategoryMemoryMinimizeFeatureGenerator...
Fitting DatetimeFeatureGenerator...
/usr/local/lib/python3.7/site-packages/autogluon/features/generators/datetime.py:59: FutureWarning: casting datetime64[ns, UTC] values to int64 with .astype(...) is deprecated and will raise in a future version. Use .view(...) instead.
good_rows = series[~series.isin(bad_rows)].astype(np.int64)
Stage 4 Generators:
Fitting DropUniqueFeatureGenerator...
Types of features in original data (raw dtype, special dtypes):
('category', []) : 3 | ['season', 'weather', 'hour_category']
('datetime', []) : 1 | ['datetime']
('float', []) : 3 | ['temp', 'atemp', 'windspeed']
('int', []) : 9 | ['holiday', 'workingday', 'humidity', 'datetime_hour', 'datetime_day', ...]
Types of features in processed data (raw dtype, special dtypes):
('category', []) : 3 | ['season', 'weather', 'hour_category']
('float', []) : 3 | ['temp', 'atemp', 'windspeed']
('int', []) : 6 | ['humidity', 'datetime_hour', 'datetime_day', 'datetime_month', 'datetime_dayofweek', ...]
('int', ['bool']) : 3 | ['holiday', 'workingday', 'datetime_year']
('int', ['datetime_as_int']) : 5 | ['datetime', 'datetime.year', 'datetime.month', 'datetime.day', 'datetime.dayofweek']
0.3s = Fit runtime
16 features in original data used to generate 20 features in processed data.
Train Data (Processed) Memory Usage: 1.29 MB (0.0% of available memory)
Data preprocessing and feature engineering runtime = 0.36s ...
AutoGluon will gauge predictive performance using evaluation metric: 'root_mean_squared_error'
This metric's sign has been flipped to adhere to being higher_is_better. The metric score can be multiplied by -1 to get the metric value.
To change this, specify the eval_metric parameter of Predictor()
AutoGluon will fit 2 stack levels (L1 to L2) ...
Fitting 11 L1 models ...
Fitting model: KNeighborsUnif_BAG_L1 ... Training model for up to 399.66s of the 599.64s of remaining time.
-101.5462 = Validation score (-root_mean_squared_error)
0.07s = Training runtime
0.11s = Validation runtime
Fitting model: KNeighborsDist_BAG_L1 ... Training model for up to 399.23s of the 599.21s of remaining time.
-84.1251 = Validation score (-root_mean_squared_error)
0.06s = Training runtime
0.11s = Validation runtime
Fitting model: LightGBMXT_BAG_L1 ... Training model for up to 398.78s of the 598.76s of remaining time.
Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy
-32.9724 = Validation score (-root_mean_squared_error)
105.5s = Training runtime
17.44s = Validation runtime
Fitting model: LightGBM_BAG_L1 ... Training model for up to 280.63s of the 480.61s of remaining time.
Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy
-33.5406 = Validation score (-root_mean_squared_error)
52.79s = Training runtime
3.46s = Validation runtime
Fitting model: RandomForestMSE_BAG_L1 ... Training model for up to 201.19s of the 401.17s of remaining time.
-38.2831 = Validation score (-root_mean_squared_error)
17.86s = Training runtime
0.9s = Validation runtime
Fitting model: CatBoost_BAG_L1 ... Training model for up to 179.72s of the 379.7s of remaining time.
Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy
-34.2232 = Validation score (-root_mean_squared_error)
171.68s = Training runtime
0.16s = Validation runtime
Completed 1/20 k-fold bagging repeats ...
Fitting model: WeightedEnsemble_L2 ... Training model for up to 360.0s of the 188.41s of remaining time.
-31.6532 = Validation score (-root_mean_squared_error)
0.67s = Training runtime
0.0s = Validation runtime
Fitting 9 L2 models ...
Fitting model: LightGBMXT_BAG_L2 ... Training model for up to 187.64s of the 187.62s of remaining time.
Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy
-31.1273 = Validation score (-root_mean_squared_error)
35.7s = Training runtime
0.7s = Validation runtime
Fitting model: LightGBM_BAG_L2 ... Training model for up to 147.59s of the 147.57s of remaining time.
Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy
-30.4888 = Validation score (-root_mean_squared_error)
27.82s = Training runtime
0.31s = Validation runtime
Fitting model: RandomForestMSE_BAG_L2 ... Training model for up to 115.19s of the 115.17s of remaining time.
-31.3546 = Validation score (-root_mean_squared_error)
29.86s = Training runtime
0.68s = Validation runtime
Fitting model: CatBoost_BAG_L2 ... Training model for up to 82.25s of the 82.23s of remaining time.
Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy
-30.3779 = Validation score (-root_mean_squared_error)
79.54s = Training runtime
0.13s = Validation runtime
Completed 1/20 k-fold bagging repeats ...
Fitting model: WeightedEnsemble_L3 ... Training model for up to 360.0s of the -1.76s of remaining time.
-30.1069 = Validation score (-root_mean_squared_error)
0.29s = Training runtime
0.0s = Validation runtime
AutoGluon training complete, total runtime = 602.26s ... Best model: "WeightedEnsemble_L3"
TabularPredictor saved. To load, use: predictor = TabularPredictor.load("AutogluonModels/ag-20221226_142547/")
predictor_new_features_2a.fit_summary()
*** Summary of fit() ***
Estimated performance of each model:
model score_val pred_time_val fit_time pred_time_val_marginal fit_time_marginal stack_level can_infer fit_order
0 WeightedEnsemble_L3 -30.106888 24.001545 521.170630 0.001105 0.292291 3 True 12
1 CatBoost_BAG_L2 -30.377937 22.302106 427.497371 0.125932 79.537608 2 True 11
2 LightGBM_BAG_L2 -30.488781 22.488722 375.780792 0.312548 27.821029 2 True 9
3 LightGBMXT_BAG_L2 -31.127338 22.881001 383.662378 0.704827 35.702615 2 True 8
4 RandomForestMSE_BAG_L2 -31.354564 22.857134 377.817086 0.680960 29.857323 2 True 10
5 WeightedEnsemble_L2 -31.653175 22.069522 348.557548 0.001258 0.665796 2 True 7
6 LightGBMXT_BAG_L1 -32.972358 17.436497 105.503806 17.436497 105.503806 1 True 3
7 LightGBM_BAG_L1 -33.540630 3.461847 52.791747 3.461847 52.791747 1 True 4
8 CatBoost_BAG_L1 -34.223240 0.159176 171.680278 0.159176 171.680278 1 True 6
9 RandomForestMSE_BAG_L1 -38.283140 0.903128 17.857386 0.903128 17.857386 1 True 5
10 KNeighborsDist_BAG_L1 -84.125061 0.107616 0.058535 0.107616 0.058535 1 True 2
11 KNeighborsUnif_BAG_L1 -101.546199 0.107910 0.068011 0.107910 0.068011 1 True 1
Number of models trained: 12
Types of models trained:
{'WeightedEnsembleModel', 'StackerEnsembleModel_LGB', 'StackerEnsembleModel_KNN', 'StackerEnsembleModel_RF', 'StackerEnsembleModel_CatBoost'}
Bagging used: True (with 8 folds)
Multi-layer stack-ensembling used: True (with 3 levels)
Feature Metadata (Processed):
(raw dtype, special dtypes):
('category', []) : 3 | ['season', 'weather', 'hour_category']
('float', []) : 3 | ['temp', 'atemp', 'windspeed']
('int', []) : 6 | ['humidity', 'datetime_hour', 'datetime_day', 'datetime_month', 'datetime_dayofweek', ...]
('int', ['bool']) : 3 | ['holiday', 'workingday', 'datetime_year']
('int', ['datetime_as_int']) : 5 | ['datetime', 'datetime.year', 'datetime.month', 'datetime.day', 'datetime.dayofweek']
Plot summary of models saved to file: AutogluonModels/ag-20221226_142547/SummaryOfModels.html
*** End of fit() summary ***
{'model_types': {'KNeighborsUnif_BAG_L1': 'StackerEnsembleModel_KNN',
'KNeighborsDist_BAG_L1': 'StackerEnsembleModel_KNN',
'LightGBMXT_BAG_L1': 'StackerEnsembleModel_LGB',
'LightGBM_BAG_L1': 'StackerEnsembleModel_LGB',
'RandomForestMSE_BAG_L1': 'StackerEnsembleModel_RF',
'CatBoost_BAG_L1': 'StackerEnsembleModel_CatBoost',
'WeightedEnsemble_L2': 'WeightedEnsembleModel',
'LightGBMXT_BAG_L2': 'StackerEnsembleModel_LGB',
'LightGBM_BAG_L2': 'StackerEnsembleModel_LGB',
'RandomForestMSE_BAG_L2': 'StackerEnsembleModel_RF',
'CatBoost_BAG_L2': 'StackerEnsembleModel_CatBoost',
'WeightedEnsemble_L3': 'WeightedEnsembleModel'},
'model_performance': {'KNeighborsUnif_BAG_L1': -101.54619908446061,
'KNeighborsDist_BAG_L1': -84.12506123181602,
'LightGBMXT_BAG_L1': -32.972357766618615,
'LightGBM_BAG_L1': -33.54062969122618,
'RandomForestMSE_BAG_L1': -38.28313968009453,
'CatBoost_BAG_L1': -34.2232398416045,
'WeightedEnsemble_L2': -31.653175350057957,
'LightGBMXT_BAG_L2': -31.12733811020385,
'LightGBM_BAG_L2': -30.488780530107636,
'RandomForestMSE_BAG_L2': -31.354563539567376,
'CatBoost_BAG_L2': -30.37793710395594,
'WeightedEnsemble_L3': -30.10688847519134},
'model_best': 'WeightedEnsemble_L3',
'model_paths': {'KNeighborsUnif_BAG_L1': 'AutogluonModels/ag-20221226_142547/models/KNeighborsUnif_BAG_L1/',
'KNeighborsDist_BAG_L1': 'AutogluonModels/ag-20221226_142547/models/KNeighborsDist_BAG_L1/',
'LightGBMXT_BAG_L1': 'AutogluonModels/ag-20221226_142547/models/LightGBMXT_BAG_L1/',
'LightGBM_BAG_L1': 'AutogluonModels/ag-20221226_142547/models/LightGBM_BAG_L1/',
'RandomForestMSE_BAG_L1': 'AutogluonModels/ag-20221226_142547/models/RandomForestMSE_BAG_L1/',
'CatBoost_BAG_L1': 'AutogluonModels/ag-20221226_142547/models/CatBoost_BAG_L1/',
'WeightedEnsemble_L2': 'AutogluonModels/ag-20221226_142547/models/WeightedEnsemble_L2/',
'LightGBMXT_BAG_L2': 'AutogluonModels/ag-20221226_142547/models/LightGBMXT_BAG_L2/',
'LightGBM_BAG_L2': 'AutogluonModels/ag-20221226_142547/models/LightGBM_BAG_L2/',
'RandomForestMSE_BAG_L2': 'AutogluonModels/ag-20221226_142547/models/RandomForestMSE_BAG_L2/',
'CatBoost_BAG_L2': 'AutogluonModels/ag-20221226_142547/models/CatBoost_BAG_L2/',
'WeightedEnsemble_L3': 'AutogluonModels/ag-20221226_142547/models/WeightedEnsemble_L3/'},
'model_fit_times': {'KNeighborsUnif_BAG_L1': 0.06801056861877441,
'KNeighborsDist_BAG_L1': 0.058534860610961914,
'LightGBMXT_BAG_L1': 105.5038058757782,
'LightGBM_BAG_L1': 52.791746854782104,
'RandomForestMSE_BAG_L1': 17.857386350631714,
'CatBoost_BAG_L1': 171.680278301239,
'WeightedEnsemble_L2': 0.6657960414886475,
'LightGBMXT_BAG_L2': 35.7026150226593,
'LightGBM_BAG_L2': 27.82102870941162,
'RandomForestMSE_BAG_L2': 29.85732340812683,
'CatBoost_BAG_L2': 79.53760838508606,
'WeightedEnsemble_L3': 0.29229116439819336},
'model_pred_times': {'KNeighborsUnif_BAG_L1': 0.1079099178314209,
'KNeighborsDist_BAG_L1': 0.10761642456054688,
'LightGBMXT_BAG_L1': 17.43649673461914,
'LightGBM_BAG_L1': 3.4618468284606934,
'RandomForestMSE_BAG_L1': 0.903127908706665,
'CatBoost_BAG_L1': 0.15917611122131348,
'WeightedEnsemble_L2': 0.0012576580047607422,
'LightGBMXT_BAG_L2': 0.704827070236206,
'LightGBM_BAG_L2': 0.3125481605529785,
'RandomForestMSE_BAG_L2': 0.6809597015380859,
'CatBoost_BAG_L2': 0.1259317398071289,
'WeightedEnsemble_L3': 0.0011048316955566406},
'num_bag_folds': 8,
'max_stack_level': 3,
'model_hyperparams': {'KNeighborsUnif_BAG_L1': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True,
'use_child_oof': True},
'KNeighborsDist_BAG_L1': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True,
'use_child_oof': True},
'LightGBMXT_BAG_L1': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'LightGBM_BAG_L1': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'RandomForestMSE_BAG_L1': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True,
'use_child_oof': True},
'CatBoost_BAG_L1': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'WeightedEnsemble_L2': {'use_orig_features': False,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'LightGBMXT_BAG_L2': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'LightGBM_BAG_L2': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'RandomForestMSE_BAG_L2': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True,
'use_child_oof': True},
'CatBoost_BAG_L2': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'WeightedEnsemble_L3': {'use_orig_features': False,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True}},
'leaderboard': model score_val pred_time_val fit_time \
0 WeightedEnsemble_L3 -30.106888 24.001545 521.170630
1 CatBoost_BAG_L2 -30.377937 22.302106 427.497371
2 LightGBM_BAG_L2 -30.488781 22.488722 375.780792
3 LightGBMXT_BAG_L2 -31.127338 22.881001 383.662378
4 RandomForestMSE_BAG_L2 -31.354564 22.857134 377.817086
5 WeightedEnsemble_L2 -31.653175 22.069522 348.557548
6 LightGBMXT_BAG_L1 -32.972358 17.436497 105.503806
7 LightGBM_BAG_L1 -33.540630 3.461847 52.791747
8 CatBoost_BAG_L1 -34.223240 0.159176 171.680278
9 RandomForestMSE_BAG_L1 -38.283140 0.903128 17.857386
10 KNeighborsDist_BAG_L1 -84.125061 0.107616 0.058535
11 KNeighborsUnif_BAG_L1 -101.546199 0.107910 0.068011
pred_time_val_marginal fit_time_marginal stack_level can_infer \
0 0.001105 0.292291 3 True
1 0.125932 79.537608 2 True
2 0.312548 27.821029 2 True
3 0.704827 35.702615 2 True
4 0.680960 29.857323 2 True
5 0.001258 0.665796 2 True
6 17.436497 105.503806 1 True
7 3.461847 52.791747 1 True
8 0.159176 171.680278 1 True
9 0.903128 17.857386 1 True
10 0.107616 0.058535 1 True
11 0.107910 0.068011 1 True
fit_order
0 12
1 11
2 9
3 8
4 10
5 7
6 3
7 4
8 6
9 5
10 2
11 1 }
predictor_new_features_2a.leaderboard(silent=True)
| model | score_val | pred_time_val | fit_time | pred_time_val_marginal | fit_time_marginal | stack_level | can_infer | fit_order | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | WeightedEnsemble_L3 | -30.106888 | 24.001545 | 521.170630 | 0.001105 | 0.292291 | 3 | True | 12 |
| 1 | CatBoost_BAG_L2 | -30.377937 | 22.302106 | 427.497371 | 0.125932 | 79.537608 | 2 | True | 11 |
| 2 | LightGBM_BAG_L2 | -30.488781 | 22.488722 | 375.780792 | 0.312548 | 27.821029 | 2 | True | 9 |
| 3 | LightGBMXT_BAG_L2 | -31.127338 | 22.881001 | 383.662378 | 0.704827 | 35.702615 | 2 | True | 8 |
| 4 | RandomForestMSE_BAG_L2 | -31.354564 | 22.857134 | 377.817086 | 0.680960 | 29.857323 | 2 | True | 10 |
| 5 | WeightedEnsemble_L2 | -31.653175 | 22.069522 | 348.557548 | 0.001258 | 0.665796 | 2 | True | 7 |
| 6 | LightGBMXT_BAG_L1 | -32.972358 | 17.436497 | 105.503806 | 17.436497 | 105.503806 | 1 | True | 3 |
| 7 | LightGBM_BAG_L1 | -33.540630 | 3.461847 | 52.791747 | 3.461847 | 52.791747 | 1 | True | 4 |
| 8 | CatBoost_BAG_L1 | -34.223240 | 0.159176 | 171.680278 | 0.159176 | 171.680278 | 1 | True | 6 |
| 9 | RandomForestMSE_BAG_L1 | -38.283140 | 0.903128 | 17.857386 | 0.903128 | 17.857386 | 1 | True | 5 |
| 10 | KNeighborsDist_BAG_L1 | -84.125061 | 0.107616 | 0.058535 | 0.107616 | 0.058535 | 1 | True | 2 |
| 11 | KNeighborsUnif_BAG_L1 | -101.546199 | 0.107910 | 0.068011 | 0.107910 | 0.068011 | 1 | True | 1 |
fig = predictor_new_features_2a.leaderboard(silent=True).plot(kind="bar", x="model", y="score_val").figure
fig.tight_layout()
fig.savefig('img/exp_2a_leaderboard.png')
# Remember to set all negative values to zero
predictions_new_features_2a = predictor_new_features_2a.predict(test)
predictions_new_features_2a.describe()
count 6493.000000 mean 157.081726 std 136.723343 min 2.417695 25% 51.513840 50% 120.013924 75% 223.343887 max 810.798950 Name: count, dtype: float64
(predictions_new_features_2a<0).sum()
0
predictions_new_features_2a = predictions_new_features_2a.apply(lambda x: 0 if x<0 else x)
submission_new_features_2a = pd.read_csv("sampleSubmission.csv")
# Same submitting predictions
submission_new_features_2a["count"] = predictions_new_features_2a
submission_new_features_2a.to_csv("submission_new_features_2a.csv", index=False)
!kaggle competitions submit -c bike-sharing-demand -f submission_new_features_2a.csv -m "new features 2a"
!kaggle competitions submissions -c bike-sharing-demand | tail -n +1 | head -n 10
fileName date description status publicScore privateScore
------------------------------ ------------------- ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- -------- ----------- ------------
submission_new_hpo_3e.csv 2022-12-26 13:59:46 hp tuning 3e complete 0.49307 0.49307
submission_new_hpo_3d.csv 2022-12-26 13:50:52 hp tuning 3d complete 0.52253 0.52253
submission_new_hpo_3c.csv 2022-12-26 13:46:38 hpo 3c num_bag_sets = 5 complete 0.62247 0.62247
submission_new_hpo_3b.csv 2022-12-26 13:35:38 hpo 3b num_bag_folds = 10 complete 0.63100 0.63100
submission_new_hpo_3a.csv 2022-12-26 13:24:35 hpo 3a num_stack_levels = 2 complete 0.66835 0.66835
submission_new_features_2b.csv 2022-12-26 13:13:10 new features 2b complete 0.65357 0.65357
submission_new_features_2a.csv 2022-12-26 12:35:34 new features 2a complete 0.62078 0.62078
submission.csv 2022-12-26 12:10:34 initial submission 1 complete 1.79067 1.79067
tail: error writing 'standard output': Broken pipe
Traceback (most recent call last):
File "/usr/local/bin/kaggle", line 8, in <module>
sys.exit(main())
File "/usr/local/lib/python3.7/site-packages/kaggle/cli.py", line 67, in main
out = args.func(**command_args)
File "/usr/local/lib/python3.7/site-packages/kaggle/api/kaggle_api_extended.py", line 618, in competition_submissions_cli
self.print_table(submissions, fields)
File "/usr/local/lib/python3.7/site-packages/kaggle/api/kaggle_api_extended.py", line 2253, in print_table
print(row_format.format(*i_fields))
BrokenPipeError: [Errno 32] Broken pipe
('int', ['datetime_as_int']) : 5 | ['datetime', 'datetime.year', 'datetime.month', 'datetime.day', 'datetime.dayofweek']
['datetime_month','datetime_day','datetime_dayofweek', 'datetime_year'] to check if removing duplication improves final performance.predictor_new_features_2b = TabularPredictor(
label = 'count',
eval_metric = 'root_mean_squared_error',
).fit(
train_data = train.drop(columns = ['datetime_month','datetime_day','datetime_dayofweek','datetime_week', 'datetime_year']),
time_limit = 600,
presets='best_quality',
)
No path specified. Models will be saved in: "AutogluonModels/ag-20221226_124540/"
Presets specified: ['best_quality']
Stack configuration (auto_stack=True): num_stack_levels=1, num_bag_folds=8, num_bag_sets=20
Beginning AutoGluon training ... Time limit = 600s
AutoGluon will save models to "AutogluonModels/ag-20221226_124540/"
AutoGluon Version: 0.6.1
Python Version: 3.7.10
Operating System: Linux
Platform Machine: x86_64
Platform Version: #1 SMP Wed Oct 26 20:36:53 UTC 2022
Train Data Rows: 10886
Train Data Columns: 11
Label Column: count
Preprocessing data ...
AutoGluon infers your prediction problem is: 'regression' (because dtype of label-column == int and many unique label-values observed).
Label info (max, min, mean, stddev): (977, 1, 191.57413, 181.14445)
If 'regression' is not the correct problem_type, please manually specify the problem_type parameter during predictor init (You may specify problem_type as one of: ['binary', 'multiclass', 'regression'])
Using Feature Generators to preprocess the data ...
Fitting AutoMLPipelineFeatureGenerator...
Available Memory: 5263.32 MB
Train Data (Original) Memory Usage: 0.73 MB (0.0% of available memory)
Inferring data type of each feature based on column values. Set feature_metadata_in to manually specify special dtypes of the features.
Stage 1 Generators:
Fitting AsTypeFeatureGenerator...
Note: Converting 2 features to boolean dtype as they only contain 2 unique values.
Stage 2 Generators:
Fitting FillNaFeatureGenerator...
Stage 3 Generators:
Fitting IdentityFeatureGenerator...
Fitting CategoryFeatureGenerator...
Fitting CategoryMemoryMinimizeFeatureGenerator...
Fitting DatetimeFeatureGenerator...
/usr/local/lib/python3.7/site-packages/autogluon/features/generators/datetime.py:59: FutureWarning: casting datetime64[ns, UTC] values to int64 with .astype(...) is deprecated and will raise in a future version. Use .view(...) instead.
good_rows = series[~series.isin(bad_rows)].astype(np.int64)
Stage 4 Generators:
Fitting DropUniqueFeatureGenerator...
Types of features in original data (raw dtype, special dtypes):
('category', []) : 3 | ['season', 'weather', 'hour_category']
('datetime', []) : 1 | ['datetime']
('float', []) : 3 | ['temp', 'atemp', 'windspeed']
('int', []) : 4 | ['holiday', 'workingday', 'humidity', 'datetime_hour']
Types of features in processed data (raw dtype, special dtypes):
('category', []) : 3 | ['season', 'weather', 'hour_category']
('float', []) : 3 | ['temp', 'atemp', 'windspeed']
('int', []) : 2 | ['humidity', 'datetime_hour']
('int', ['bool']) : 2 | ['holiday', 'workingday']
('int', ['datetime_as_int']) : 5 | ['datetime', 'datetime.year', 'datetime.month', 'datetime.day', 'datetime.dayofweek']
0.2s = Fit runtime
11 features in original data used to generate 15 features in processed data.
Train Data (Processed) Memory Usage: 0.93 MB (0.0% of available memory)
Data preprocessing and feature engineering runtime = 0.3s ...
AutoGluon will gauge predictive performance using evaluation metric: 'root_mean_squared_error'
This metric's sign has been flipped to adhere to being higher_is_better. The metric score can be multiplied by -1 to get the metric value.
To change this, specify the eval_metric parameter of Predictor()
AutoGluon will fit 2 stack levels (L1 to L2) ...
Fitting 11 L1 models ...
Fitting model: KNeighborsUnif_BAG_L1 ... Training model for up to 399.7s of the 599.7s of remaining time.
-101.5462 = Validation score (-root_mean_squared_error)
0.04s = Training runtime
0.11s = Validation runtime
Fitting model: KNeighborsDist_BAG_L1 ... Training model for up to 399.32s of the 599.31s of remaining time.
-84.1251 = Validation score (-root_mean_squared_error)
0.03s = Training runtime
0.1s = Validation runtime
Fitting model: LightGBMXT_BAG_L1 ... Training model for up to 398.95s of the 598.94s of remaining time.
Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy
-33.0698 = Validation score (-root_mean_squared_error)
89.33s = Training runtime
17.9s = Validation runtime
Fitting model: LightGBM_BAG_L1 ... Training model for up to 299.87s of the 499.86s of remaining time.
Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy
-33.5413 = Validation score (-root_mean_squared_error)
47.52s = Training runtime
3.57s = Validation runtime
Fitting model: RandomForestMSE_BAG_L1 ... Training model for up to 247.65s of the 447.65s of remaining time.
-38.3046 = Validation score (-root_mean_squared_error)
12.68s = Training runtime
0.63s = Validation runtime
Fitting model: CatBoost_BAG_L1 ... Training model for up to 231.88s of the 431.87s of remaining time.
Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy
-33.9384 = Validation score (-root_mean_squared_error)
199.87s = Training runtime
0.21s = Validation runtime
Fitting model: ExtraTreesMSE_BAG_L1 ... Training model for up to 27.52s of the 227.51s of remaining time.
-37.8411 = Validation score (-root_mean_squared_error)
5.98s = Training runtime
0.61s = Validation runtime
Fitting model: NeuralNetFastAI_BAG_L1 ... Training model for up to 18.38s of the 218.38s of remaining time.
Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy
-72.7884 = Validation score (-root_mean_squared_error)
41.24s = Training runtime
0.47s = Validation runtime
Completed 1/20 k-fold bagging repeats ...
Fitting model: WeightedEnsemble_L2 ... Training model for up to 360.0s of the 172.81s of remaining time.
-31.7363 = Validation score (-root_mean_squared_error)
0.5s = Training runtime
0.0s = Validation runtime
Fitting 9 L2 models ...
Fitting model: LightGBMXT_BAG_L2 ... Training model for up to 172.23s of the 172.2s of remaining time.
Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy
-31.004 = Validation score (-root_mean_squared_error)
31.22s = Training runtime
1.01s = Validation runtime
Fitting model: LightGBM_BAG_L2 ... Training model for up to 135.93s of the 135.91s of remaining time.
Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy
-30.5483 = Validation score (-root_mean_squared_error)
26.1s = Training runtime
0.33s = Validation runtime
Fitting model: RandomForestMSE_BAG_L2 ... Training model for up to 105.54s of the 105.51s of remaining time.
-31.3624 = Validation score (-root_mean_squared_error)
30.39s = Training runtime
0.67s = Validation runtime
Fitting model: CatBoost_BAG_L2 ... Training model for up to 72.05s of the 72.02s of remaining time.
Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy
-30.3886 = Validation score (-root_mean_squared_error)
71.45s = Training runtime
0.12s = Validation runtime
Completed 1/20 k-fold bagging repeats ...
Fitting model: WeightedEnsemble_L3 ... Training model for up to 360.0s of the -3.8s of remaining time.
-30.1509 = Validation score (-root_mean_squared_error)
0.29s = Training runtime
0.0s = Validation runtime
AutoGluon training complete, total runtime = 604.3s ... Best model: "WeightedEnsemble_L3"
TabularPredictor saved. To load, use: predictor = TabularPredictor.load("AutogluonModels/ag-20221226_124540/")
predictor_new_features_2b.fit_summary()
*** Summary of fit() ***
Estimated performance of each model:
model score_val pred_time_val fit_time pred_time_val_marginal fit_time_marginal stack_level can_infer fit_order
0 WeightedEnsemble_L3 -30.150922 25.725962 556.149490 0.000877 0.293293 3 True 14
1 CatBoost_BAG_L2 -30.388608 23.719227 468.144321 0.119392 71.445114 2 True 13
2 LightGBM_BAG_L2 -30.548320 23.927661 422.797380 0.327826 26.098173 2 True 11
3 LightGBMXT_BAG_L2 -31.003969 24.606218 427.920316 1.006384 31.221109 2 True 10
4 RandomForestMSE_BAG_L2 -31.362428 24.271482 427.091802 0.671648 30.392595 2 True 12
5 WeightedEnsemble_L2 -31.736281 22.411067 349.937558 0.000895 0.503409 2 True 9
6 LightGBMXT_BAG_L1 -33.069780 17.895373 89.332585 17.895373 89.332585 1 True 3
7 LightGBM_BAG_L1 -33.541281 3.571695 47.522052 3.571695 47.522052 1 True 4
8 CatBoost_BAG_L1 -33.938428 0.207587 199.872878 0.207587 199.872878 1 True 6
9 ExtraTreesMSE_BAG_L1 -37.841138 0.608934 5.977964 0.608934 5.977964 1 True 7
10 RandomForestMSE_BAG_L1 -38.304598 0.631685 12.677532 0.631685 12.677532 1 True 5
11 NeuralNetFastAI_BAG_L1 -72.788383 0.474992 41.243789 0.474992 41.243789 1 True 8
12 KNeighborsDist_BAG_L1 -84.125061 0.103832 0.029102 0.103832 0.029102 1 True 2
13 KNeighborsUnif_BAG_L1 -101.546199 0.105737 0.043305 0.105737 0.043305 1 True 1
Number of models trained: 14
Types of models trained:
{'StackerEnsembleModel_NNFastAiTabular', 'WeightedEnsembleModel', 'StackerEnsembleModel_LGB', 'StackerEnsembleModel_KNN', 'StackerEnsembleModel_XT', 'StackerEnsembleModel_RF', 'StackerEnsembleModel_CatBoost'}
Bagging used: True (with 8 folds)
Multi-layer stack-ensembling used: True (with 3 levels)
Feature Metadata (Processed):
(raw dtype, special dtypes):
('category', []) : 3 | ['season', 'weather', 'hour_category']
('float', []) : 3 | ['temp', 'atemp', 'windspeed']
('int', []) : 2 | ['humidity', 'datetime_hour']
('int', ['bool']) : 2 | ['holiday', 'workingday']
('int', ['datetime_as_int']) : 5 | ['datetime', 'datetime.year', 'datetime.month', 'datetime.day', 'datetime.dayofweek']
Plot summary of models saved to file: AutogluonModels/ag-20221226_124540/SummaryOfModels.html
*** End of fit() summary ***
{'model_types': {'KNeighborsUnif_BAG_L1': 'StackerEnsembleModel_KNN',
'KNeighborsDist_BAG_L1': 'StackerEnsembleModel_KNN',
'LightGBMXT_BAG_L1': 'StackerEnsembleModel_LGB',
'LightGBM_BAG_L1': 'StackerEnsembleModel_LGB',
'RandomForestMSE_BAG_L1': 'StackerEnsembleModel_RF',
'CatBoost_BAG_L1': 'StackerEnsembleModel_CatBoost',
'ExtraTreesMSE_BAG_L1': 'StackerEnsembleModel_XT',
'NeuralNetFastAI_BAG_L1': 'StackerEnsembleModel_NNFastAiTabular',
'WeightedEnsemble_L2': 'WeightedEnsembleModel',
'LightGBMXT_BAG_L2': 'StackerEnsembleModel_LGB',
'LightGBM_BAG_L2': 'StackerEnsembleModel_LGB',
'RandomForestMSE_BAG_L2': 'StackerEnsembleModel_RF',
'CatBoost_BAG_L2': 'StackerEnsembleModel_CatBoost',
'WeightedEnsemble_L3': 'WeightedEnsembleModel'},
'model_performance': {'KNeighborsUnif_BAG_L1': -101.54619908446061,
'KNeighborsDist_BAG_L1': -84.12506123181602,
'LightGBMXT_BAG_L1': -33.06977986045687,
'LightGBM_BAG_L1': -33.541281049845416,
'RandomForestMSE_BAG_L1': -38.30459792418722,
'CatBoost_BAG_L1': -33.938427705209676,
'ExtraTreesMSE_BAG_L1': -37.84113795024617,
'NeuralNetFastAI_BAG_L1': -72.78838342939757,
'WeightedEnsemble_L2': -31.73628083370923,
'LightGBMXT_BAG_L2': -31.00396851804603,
'LightGBM_BAG_L2': -30.54831969680547,
'RandomForestMSE_BAG_L2': -31.362428429852514,
'CatBoost_BAG_L2': -30.388607844819283,
'WeightedEnsemble_L3': -30.150921718093382},
'model_best': 'WeightedEnsemble_L3',
'model_paths': {'KNeighborsUnif_BAG_L1': 'AutogluonModels/ag-20221226_124540/models/KNeighborsUnif_BAG_L1/',
'KNeighborsDist_BAG_L1': 'AutogluonModels/ag-20221226_124540/models/KNeighborsDist_BAG_L1/',
'LightGBMXT_BAG_L1': 'AutogluonModels/ag-20221226_124540/models/LightGBMXT_BAG_L1/',
'LightGBM_BAG_L1': 'AutogluonModels/ag-20221226_124540/models/LightGBM_BAG_L1/',
'RandomForestMSE_BAG_L1': 'AutogluonModels/ag-20221226_124540/models/RandomForestMSE_BAG_L1/',
'CatBoost_BAG_L1': 'AutogluonModels/ag-20221226_124540/models/CatBoost_BAG_L1/',
'ExtraTreesMSE_BAG_L1': 'AutogluonModels/ag-20221226_124540/models/ExtraTreesMSE_BAG_L1/',
'NeuralNetFastAI_BAG_L1': 'AutogluonModels/ag-20221226_124540/models/NeuralNetFastAI_BAG_L1/',
'WeightedEnsemble_L2': 'AutogluonModels/ag-20221226_124540/models/WeightedEnsemble_L2/',
'LightGBMXT_BAG_L2': 'AutogluonModels/ag-20221226_124540/models/LightGBMXT_BAG_L2/',
'LightGBM_BAG_L2': 'AutogluonModels/ag-20221226_124540/models/LightGBM_BAG_L2/',
'RandomForestMSE_BAG_L2': 'AutogluonModels/ag-20221226_124540/models/RandomForestMSE_BAG_L2/',
'CatBoost_BAG_L2': 'AutogluonModels/ag-20221226_124540/models/CatBoost_BAG_L2/',
'WeightedEnsemble_L3': 'AutogluonModels/ag-20221226_124540/models/WeightedEnsemble_L3/'},
'model_fit_times': {'KNeighborsUnif_BAG_L1': 0.043305397033691406,
'KNeighborsDist_BAG_L1': 0.02910161018371582,
'LightGBMXT_BAG_L1': 89.33258485794067,
'LightGBM_BAG_L1': 47.52205228805542,
'RandomForestMSE_BAG_L1': 12.677532434463501,
'CatBoost_BAG_L1': 199.87287783622742,
'ExtraTreesMSE_BAG_L1': 5.97796368598938,
'NeuralNetFastAI_BAG_L1': 41.243788957595825,
'WeightedEnsemble_L2': 0.503408670425415,
'LightGBMXT_BAG_L2': 31.22110891342163,
'LightGBM_BAG_L2': 26.098172664642334,
'RandomForestMSE_BAG_L2': 30.392594814300537,
'CatBoost_BAG_L2': 71.44511413574219,
'WeightedEnsemble_L3': 0.2932925224304199},
'model_pred_times': {'KNeighborsUnif_BAG_L1': 0.10573673248291016,
'KNeighborsDist_BAG_L1': 0.10383224487304688,
'LightGBMXT_BAG_L1': 17.89537262916565,
'LightGBM_BAG_L1': 3.5716946125030518,
'RandomForestMSE_BAG_L1': 0.6316852569580078,
'CatBoost_BAG_L1': 0.20758748054504395,
'ExtraTreesMSE_BAG_L1': 0.608933687210083,
'NeuralNetFastAI_BAG_L1': 0.4749917984008789,
'WeightedEnsemble_L2': 0.0008945465087890625,
'LightGBMXT_BAG_L2': 1.0063836574554443,
'LightGBM_BAG_L2': 0.32782626152038574,
'RandomForestMSE_BAG_L2': 0.6716480255126953,
'CatBoost_BAG_L2': 0.11939239501953125,
'WeightedEnsemble_L3': 0.0008769035339355469},
'num_bag_folds': 8,
'max_stack_level': 3,
'model_hyperparams': {'KNeighborsUnif_BAG_L1': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True,
'use_child_oof': True},
'KNeighborsDist_BAG_L1': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True,
'use_child_oof': True},
'LightGBMXT_BAG_L1': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'LightGBM_BAG_L1': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'RandomForestMSE_BAG_L1': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True,
'use_child_oof': True},
'CatBoost_BAG_L1': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'ExtraTreesMSE_BAG_L1': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True,
'use_child_oof': True},
'NeuralNetFastAI_BAG_L1': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'WeightedEnsemble_L2': {'use_orig_features': False,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'LightGBMXT_BAG_L2': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'LightGBM_BAG_L2': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'RandomForestMSE_BAG_L2': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True,
'use_child_oof': True},
'CatBoost_BAG_L2': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'WeightedEnsemble_L3': {'use_orig_features': False,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True}},
'leaderboard': model score_val pred_time_val fit_time \
0 WeightedEnsemble_L3 -30.150922 25.725962 556.149490
1 CatBoost_BAG_L2 -30.388608 23.719227 468.144321
2 LightGBM_BAG_L2 -30.548320 23.927661 422.797380
3 LightGBMXT_BAG_L2 -31.003969 24.606218 427.920316
4 RandomForestMSE_BAG_L2 -31.362428 24.271482 427.091802
5 WeightedEnsemble_L2 -31.736281 22.411067 349.937558
6 LightGBMXT_BAG_L1 -33.069780 17.895373 89.332585
7 LightGBM_BAG_L1 -33.541281 3.571695 47.522052
8 CatBoost_BAG_L1 -33.938428 0.207587 199.872878
9 ExtraTreesMSE_BAG_L1 -37.841138 0.608934 5.977964
10 RandomForestMSE_BAG_L1 -38.304598 0.631685 12.677532
11 NeuralNetFastAI_BAG_L1 -72.788383 0.474992 41.243789
12 KNeighborsDist_BAG_L1 -84.125061 0.103832 0.029102
13 KNeighborsUnif_BAG_L1 -101.546199 0.105737 0.043305
pred_time_val_marginal fit_time_marginal stack_level can_infer \
0 0.000877 0.293293 3 True
1 0.119392 71.445114 2 True
2 0.327826 26.098173 2 True
3 1.006384 31.221109 2 True
4 0.671648 30.392595 2 True
5 0.000895 0.503409 2 True
6 17.895373 89.332585 1 True
7 3.571695 47.522052 1 True
8 0.207587 199.872878 1 True
9 0.608934 5.977964 1 True
10 0.631685 12.677532 1 True
11 0.474992 41.243789 1 True
12 0.103832 0.029102 1 True
13 0.105737 0.043305 1 True
fit_order
0 14
1 13
2 11
3 10
4 12
5 9
6 3
7 4
8 6
9 7
10 5
11 8
12 2
13 1 }
predictor_new_features_2b.leaderboard(silent=True)
| model | score_val | pred_time_val | fit_time | pred_time_val_marginal | fit_time_marginal | stack_level | can_infer | fit_order | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | WeightedEnsemble_L3 | -30.150922 | 25.725962 | 556.149490 | 0.000877 | 0.293293 | 3 | True | 14 |
| 1 | CatBoost_BAG_L2 | -30.388608 | 23.719227 | 468.144321 | 0.119392 | 71.445114 | 2 | True | 13 |
| 2 | LightGBM_BAG_L2 | -30.548320 | 23.927661 | 422.797380 | 0.327826 | 26.098173 | 2 | True | 11 |
| 3 | LightGBMXT_BAG_L2 | -31.003969 | 24.606218 | 427.920316 | 1.006384 | 31.221109 | 2 | True | 10 |
| 4 | RandomForestMSE_BAG_L2 | -31.362428 | 24.271482 | 427.091802 | 0.671648 | 30.392595 | 2 | True | 12 |
| 5 | WeightedEnsemble_L2 | -31.736281 | 22.411067 | 349.937558 | 0.000895 | 0.503409 | 2 | True | 9 |
| 6 | LightGBMXT_BAG_L1 | -33.069780 | 17.895373 | 89.332585 | 17.895373 | 89.332585 | 1 | True | 3 |
| 7 | LightGBM_BAG_L1 | -33.541281 | 3.571695 | 47.522052 | 3.571695 | 47.522052 | 1 | True | 4 |
| 8 | CatBoost_BAG_L1 | -33.938428 | 0.207587 | 199.872878 | 0.207587 | 199.872878 | 1 | True | 6 |
| 9 | ExtraTreesMSE_BAG_L1 | -37.841138 | 0.608934 | 5.977964 | 0.608934 | 5.977964 | 1 | True | 7 |
| 10 | RandomForestMSE_BAG_L1 | -38.304598 | 0.631685 | 12.677532 | 0.631685 | 12.677532 | 1 | True | 5 |
| 11 | NeuralNetFastAI_BAG_L1 | -72.788383 | 0.474992 | 41.243789 | 0.474992 | 41.243789 | 1 | True | 8 |
| 12 | KNeighborsDist_BAG_L1 | -84.125061 | 0.103832 | 0.029102 | 0.103832 | 0.029102 | 1 | True | 2 |
| 13 | KNeighborsUnif_BAG_L1 | -101.546199 | 0.105737 | 0.043305 | 0.105737 | 0.043305 | 1 | True | 1 |
fig = predictor_new_features_2b.leaderboard(silent=True).plot(kind="bar", x="model", y="score_val").figure
fig.tight_layout()
fig.savefig('img/exp_2b_leaderboard.png')
# Remember to set all negative values to zero
predictions_new_features_2b = predictor_new_features_2b.predict(test)
predictions_new_features_2b.describe()
count 6493.000000 mean 156.803406 std 135.177689 min 2.408113 25% 51.911545 50% 122.218353 75% 226.373245 max 794.151550 Name: count, dtype: float64
(predictions_new_features_2b<0).sum()
0
predictions_new_features_2b = predictions_new_features_2b.apply(lambda x: 0 if x<0 else x)
submission_new_features_2b = pd.read_csv("sampleSubmission.csv")
# Same submitting predictions
submission_new_features_2b["count"] = predictions_new_features_2b
submission_new_features_2b.to_csv("submission_new_features_2b.csv", index=False)
!kaggle competitions submit -c bike-sharing-demand -f submission_new_features_2b.csv -m "new features 2b"
100%|█████████████████████████████████████████| 243k/243k [00:00<00:00, 434kB/s] Successfully submitted to Bike Sharing Demand
!kaggle competitions submissions -c bike-sharing-demand | tail -n +1 | head -n 10
fileName date description status publicScore privateScore
------------------------------ ------------------- ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- -------- ----------- ------------
submission_new_features_2b.csv 2022-12-26 13:13:10 new features 2b complete 0.65357 0.65357
submission_new_features_2a.csv 2022-12-26 12:35:34 new features 2a complete 0.62078 0.62078
submission.csv 2022-12-26 12:10:34 initial submission 1 complete 1.79067 1.79067
submission_new_hpo_3f.csv 2022-12-25 20:15:51 hp tuning 3f complete 0.50219 0.50219
submission_new_hpo_3e.csv 2022-12-25 19:59:07 hp tuning 3e complete 0.53176 0.53176
submission_new_hpo_3c.csv 2022-12-25 19:35:06 hpo 3c num_bag_sets = 5 complete 0.63215 0.63215
submission_new_hpo_3b.csv 2022-12-25 19:23:27 hpo 3b num_bag_folds = 10 complete 0.79137 0.79137
submission_new_hpo_3a.csv 2022-12-25 18:45:12 hpo 3a num_stack_levels = 2 complete 0.67217 0.67217
tail: error writing 'standard output': Broken pipe
Traceback (most recent call last):
File "/usr/local/bin/kaggle", line 8, in <module>
sys.exit(main())
File "/usr/local/lib/python3.7/site-packages/kaggle/cli.py", line 67, in main
out = args.func(**command_args)
File "/usr/local/lib/python3.7/site-packages/kaggle/api/kaggle_api_extended.py", line 618, in competition_submissions_cli
self.print_table(submissions, fields)
File "/usr/local/lib/python3.7/site-packages/kaggle/api/kaggle_api_extended.py", line 2253, in print_table
print(row_format.format(*i_fields))
BrokenPipeError: [Errno 32] Broken pipe
hyperparameter and hyperparameter_tune_kwargs arguments.import autogluon.core as ag
nn_options = {
'num_epochs': 10,
'learning_rate': ag.space.Real(1e-4, 1e-2, default=5e-4, log=True),
'activation': ag.space.Categorical('relu', 'softrelu', 'tanh'),
'dropout_prob': ag.space.Real(0.0, 0.5, default=0.1),
}
gbm_options = {
'num_boost_round': ag.space.Int(lower = 100, upper = 1000),
'num_leaves': ag.space.Int(lower=26, upper=66, default=36),
}
rt_options = {
'n_estimators': ag.space.Int(lower =150,upper=500)
}
xt_options = {
'n_estimators': ag.space.Int(lower =150,upper=500)
}
cat_options = {
'iterations': ag.space.Int(lower =1000,upper=10000)
}
hyperparameters = { # hyperparameters of each model type
'GBM': gbm_options,
'NN_TORCH': nn_options, # NOTE: comment this line out if you get errors on Mac OSX
'RF': rt_options,
'XT': xt_options,
'CAT': cat_options,
} # When these keys are missing from hyperparameters dict, no models of that type are trained
time_limit = 10*60 # train various models for ~2 min
num_trials = 20 # try at most 5 different hyperparameter configurations for each type of model
search_strategy = 'bayes' # to tune hyperparameters using SKopt Bayesian optimization routine
label = 'count'
metric = 'root_mean_squared_error'
hyperparameter_tune_kwargs = { # HPO is not performed unless hyperparameter_tune_kwargs is specified
'num_trials': num_trials,
'scheduler' : 'local',
'searcher': search_strategy,
}
predictor_hpo_3e = TabularPredictor(
label=label,
eval_metric=metric
).fit(
train_data = train,
auto_stack=True,
num_stack_levels=1,
num_bag_folds=8,
num_bag_sets=5,
time_limit=time_limit,
hyperparameters=hyperparameters,
hyperparameter_tune_kwargs=hyperparameter_tune_kwargs,
)
No model was trained during hyperparameter tuning NeuralNetTorch_BAG_L2... Skipping this model.
Completed 1/5 k-fold bagging repeats ...
Fitting model: WeightedEnsemble_L3 ... Training model for up to 360.0s of the 74.96s of remaining time.
-33.158 = Validation score (-root_mean_squared_error)
0.38s = Training runtime
0.0s = Validation runtime
AutoGluon training complete, total runtime = 525.63s ... Best model: "WeightedEnsemble_L2"
TabularPredictor saved. To load, use: predictor = TabularPredictor.load("AutogluonModels/ag-20221226_135054/")
predictor_hpo_3e.fit_summary()
*** Summary of fit() ***
Estimated performance of each model:
model score_val pred_time_val fit_time pred_time_val_marginal fit_time_marginal stack_level can_infer fit_order
0 WeightedEnsemble_L2 -33.038009 0.001529 135.658681 0.000880 0.846915 2 True 15
1 WeightedEnsemble_L3 -33.158029 0.003626 385.054592 0.000834 0.378233 3 True 22
2 LightGBM_BAG_L2/T1 -33.484871 0.002360 292.803847 0.000132 25.689138 2 True 16
3 ExtraTrees_BAG_L2/T2 -33.565096 0.002403 283.678812 0.000175 16.564104 2 True 20
4 LightGBM_BAG_L1/T2 -33.609296 0.000131 33.872199 0.000131 33.872199 1 True 2
5 ExtraTrees_BAG_L2/T3 -33.730641 0.002363 277.433658 0.000135 10.318949 2 True 21
6 ExtraTrees_BAG_L2/T1 -33.756812 0.002407 275.210723 0.000179 8.096014 2 True 19
7 CatBoost_BAG_L2/T1 -33.821450 0.002318 317.492665 0.000090 50.377956 2 True 18
8 RandomForest_BAG_L2/T1 -33.991969 0.002394 292.045161 0.000166 24.930453 2 True 17
9 CatBoost_BAG_L1/T1 -36.826482 0.000135 71.848728 0.000135 71.848728 1 True 7
10 ExtraTrees_BAG_L1/T6 -37.282208 0.000216 16.457945 0.000216 16.457945 1 True 13
11 ExtraTrees_BAG_L1/T7 -37.387917 0.000124 14.057925 0.000124 14.057925 1 True 14
12 ExtraTrees_BAG_L1/T5 -37.424838 0.000138 11.915664 0.000138 11.915664 1 True 12
13 ExtraTrees_BAG_L1/T2 -37.427425 0.000157 10.959226 0.000157 10.959226 1 True 9
14 ExtraTrees_BAG_L1/T4 -37.506339 0.000182 9.332303 0.000182 9.332303 1 True 11
15 ExtraTrees_BAG_L1/T3 -37.648697 0.000166 6.704234 0.000166 6.704234 1 True 10
16 ExtraTrees_BAG_L1/T1 -37.917176 0.000160 5.374717 0.000160 5.374717 1 True 8
17 RandomForest_BAG_L1/T2 -38.254801 0.000247 21.524785 0.000247 21.524785 1 True 4
18 RandomForest_BAG_L1/T4 -38.329116 0.000154 17.215766 0.000154 17.215766 1 True 6
19 RandomForest_BAG_L1/T3 -38.341621 0.000166 12.632894 0.000166 12.632894 1 True 5
20 RandomForest_BAG_L1/T1 -38.490504 0.000159 9.875644 0.000159 9.875644 1 True 3
21 LightGBM_BAG_L1/T1 -39.654394 0.000092 25.342679 0.000092 25.342679 1 True 1
Number of models trained: 22
Types of models trained:
{'WeightedEnsembleModel', 'StackerEnsembleModel_LGB', 'StackerEnsembleModel_XT', 'StackerEnsembleModel_RF', 'StackerEnsembleModel_CatBoost'}
Bagging used: True (with 8 folds)
Multi-layer stack-ensembling used: True (with 3 levels)
Feature Metadata (Processed):
(raw dtype, special dtypes):
('category', []) : 3 | ['season', 'weather', 'hour_category']
('float', []) : 3 | ['temp', 'atemp', 'windspeed']
('int', []) : 6 | ['humidity', 'datetime_hour', 'datetime_day', 'datetime_month', 'datetime_dayofweek', ...]
('int', ['bool']) : 3 | ['holiday', 'workingday', 'datetime_year']
('int', ['datetime_as_int']) : 5 | ['datetime', 'datetime.year', 'datetime.month', 'datetime.day', 'datetime.dayofweek']
Plot summary of models saved to file: AutogluonModels/ag-20221226_135054/SummaryOfModels.html
*** End of fit() summary ***
{'model_types': {'LightGBM_BAG_L1/T1': 'StackerEnsembleModel_LGB',
'LightGBM_BAG_L1/T2': 'StackerEnsembleModel_LGB',
'RandomForest_BAG_L1/T1': 'StackerEnsembleModel_RF',
'RandomForest_BAG_L1/T2': 'StackerEnsembleModel_RF',
'RandomForest_BAG_L1/T3': 'StackerEnsembleModel_RF',
'RandomForest_BAG_L1/T4': 'StackerEnsembleModel_RF',
'CatBoost_BAG_L1/T1': 'StackerEnsembleModel_CatBoost',
'ExtraTrees_BAG_L1/T1': 'StackerEnsembleModel_XT',
'ExtraTrees_BAG_L1/T2': 'StackerEnsembleModel_XT',
'ExtraTrees_BAG_L1/T3': 'StackerEnsembleModel_XT',
'ExtraTrees_BAG_L1/T4': 'StackerEnsembleModel_XT',
'ExtraTrees_BAG_L1/T5': 'StackerEnsembleModel_XT',
'ExtraTrees_BAG_L1/T6': 'StackerEnsembleModel_XT',
'ExtraTrees_BAG_L1/T7': 'StackerEnsembleModel_XT',
'WeightedEnsemble_L2': 'WeightedEnsembleModel',
'LightGBM_BAG_L2/T1': 'StackerEnsembleModel_LGB',
'RandomForest_BAG_L2/T1': 'StackerEnsembleModel_RF',
'CatBoost_BAG_L2/T1': 'StackerEnsembleModel_CatBoost',
'ExtraTrees_BAG_L2/T1': 'StackerEnsembleModel_XT',
'ExtraTrees_BAG_L2/T2': 'StackerEnsembleModel_XT',
'ExtraTrees_BAG_L2/T3': 'StackerEnsembleModel_XT',
'WeightedEnsemble_L3': 'WeightedEnsembleModel'},
'model_performance': {'LightGBM_BAG_L1/T1': -39.65439439248688,
'LightGBM_BAG_L1/T2': -33.609296000838235,
'RandomForest_BAG_L1/T1': -38.49050354468141,
'RandomForest_BAG_L1/T2': -38.254801437961284,
'RandomForest_BAG_L1/T3': -38.3416208647323,
'RandomForest_BAG_L1/T4': -38.329115887810964,
'CatBoost_BAG_L1/T1': -36.82648237440223,
'ExtraTrees_BAG_L1/T1': -37.91717566332131,
'ExtraTrees_BAG_L1/T2': -37.427425158772785,
'ExtraTrees_BAG_L1/T3': -37.648697000010955,
'ExtraTrees_BAG_L1/T4': -37.50633946452237,
'ExtraTrees_BAG_L1/T5': -37.424837832670775,
'ExtraTrees_BAG_L1/T6': -37.28220801334453,
'ExtraTrees_BAG_L1/T7': -37.38791710160737,
'WeightedEnsemble_L2': -33.038008809932116,
'LightGBM_BAG_L2/T1': -33.484870712032716,
'RandomForest_BAG_L2/T1': -33.99196891368665,
'CatBoost_BAG_L2/T1': -33.82145035992648,
'ExtraTrees_BAG_L2/T1': -33.756811756093086,
'ExtraTrees_BAG_L2/T2': -33.56509633349596,
'ExtraTrees_BAG_L2/T3': -33.7306409117612,
'WeightedEnsemble_L3': -33.15802896825262},
'model_best': 'WeightedEnsemble_L2',
'model_paths': {'LightGBM_BAG_L1/T1': '/root/aws_mle_nanodegree/project_1/AutogluonModels/ag-20221226_135054/models/LightGBM_BAG_L1/T1/',
'LightGBM_BAG_L1/T2': '/root/aws_mle_nanodegree/project_1/AutogluonModels/ag-20221226_135054/models/LightGBM_BAG_L1/T2/',
'RandomForest_BAG_L1/T1': '/root/aws_mle_nanodegree/project_1/AutogluonModels/ag-20221226_135054/models/RandomForest_BAG_L1/T1/',
'RandomForest_BAG_L1/T2': '/root/aws_mle_nanodegree/project_1/AutogluonModels/ag-20221226_135054/models/RandomForest_BAG_L1/T2/',
'RandomForest_BAG_L1/T3': '/root/aws_mle_nanodegree/project_1/AutogluonModels/ag-20221226_135054/models/RandomForest_BAG_L1/T3/',
'RandomForest_BAG_L1/T4': '/root/aws_mle_nanodegree/project_1/AutogluonModels/ag-20221226_135054/models/RandomForest_BAG_L1/T4/',
'CatBoost_BAG_L1/T1': '/root/aws_mle_nanodegree/project_1/AutogluonModels/ag-20221226_135054/models/CatBoost_BAG_L1/T1/',
'ExtraTrees_BAG_L1/T1': '/root/aws_mle_nanodegree/project_1/AutogluonModels/ag-20221226_135054/models/ExtraTrees_BAG_L1/T1/',
'ExtraTrees_BAG_L1/T2': '/root/aws_mle_nanodegree/project_1/AutogluonModels/ag-20221226_135054/models/ExtraTrees_BAG_L1/T2/',
'ExtraTrees_BAG_L1/T3': '/root/aws_mle_nanodegree/project_1/AutogluonModels/ag-20221226_135054/models/ExtraTrees_BAG_L1/T3/',
'ExtraTrees_BAG_L1/T4': '/root/aws_mle_nanodegree/project_1/AutogluonModels/ag-20221226_135054/models/ExtraTrees_BAG_L1/T4/',
'ExtraTrees_BAG_L1/T5': '/root/aws_mle_nanodegree/project_1/AutogluonModels/ag-20221226_135054/models/ExtraTrees_BAG_L1/T5/',
'ExtraTrees_BAG_L1/T6': '/root/aws_mle_nanodegree/project_1/AutogluonModels/ag-20221226_135054/models/ExtraTrees_BAG_L1/T6/',
'ExtraTrees_BAG_L1/T7': '/root/aws_mle_nanodegree/project_1/AutogluonModels/ag-20221226_135054/models/ExtraTrees_BAG_L1/T7/',
'WeightedEnsemble_L2': 'AutogluonModels/ag-20221226_135054/models/WeightedEnsemble_L2/',
'LightGBM_BAG_L2/T1': '/root/aws_mle_nanodegree/project_1/AutogluonModels/ag-20221226_135054/models/LightGBM_BAG_L2/T1/',
'RandomForest_BAG_L2/T1': '/root/aws_mle_nanodegree/project_1/AutogluonModels/ag-20221226_135054/models/RandomForest_BAG_L2/T1/',
'CatBoost_BAG_L2/T1': '/root/aws_mle_nanodegree/project_1/AutogluonModels/ag-20221226_135054/models/CatBoost_BAG_L2/T1/',
'ExtraTrees_BAG_L2/T1': '/root/aws_mle_nanodegree/project_1/AutogluonModels/ag-20221226_135054/models/ExtraTrees_BAG_L2/T1/',
'ExtraTrees_BAG_L2/T2': '/root/aws_mle_nanodegree/project_1/AutogluonModels/ag-20221226_135054/models/ExtraTrees_BAG_L2/T2/',
'ExtraTrees_BAG_L2/T3': '/root/aws_mle_nanodegree/project_1/AutogluonModels/ag-20221226_135054/models/ExtraTrees_BAG_L2/T3/',
'WeightedEnsemble_L3': 'AutogluonModels/ag-20221226_135054/models/WeightedEnsemble_L3/'},
'model_fit_times': {'LightGBM_BAG_L1/T1': 25.342678785324097,
'LightGBM_BAG_L1/T2': 33.872198820114136,
'RandomForest_BAG_L1/T1': 9.875643730163574,
'RandomForest_BAG_L1/T2': 21.52478528022766,
'RandomForest_BAG_L1/T3': 12.632893800735474,
'RandomForest_BAG_L1/T4': 17.215765953063965,
'CatBoost_BAG_L1/T1': 71.84872794151306,
'ExtraTrees_BAG_L1/T1': 5.3747169971466064,
'ExtraTrees_BAG_L1/T2': 10.95922589302063,
'ExtraTrees_BAG_L1/T3': 6.704233884811401,
'ExtraTrees_BAG_L1/T4': 9.332302808761597,
'ExtraTrees_BAG_L1/T5': 11.915664434432983,
'ExtraTrees_BAG_L1/T6': 16.457945346832275,
'ExtraTrees_BAG_L1/T7': 14.05792498588562,
'WeightedEnsemble_L2': 0.8469147682189941,
'LightGBM_BAG_L2/T1': 25.689138412475586,
'RandomForest_BAG_L2/T1': 24.930452585220337,
'CatBoost_BAG_L2/T1': 50.37795615196228,
'ExtraTrees_BAG_L2/T1': 8.096014261245728,
'ExtraTrees_BAG_L2/T2': 16.564103603363037,
'ExtraTrees_BAG_L2/T3': 10.318948984146118,
'WeightedEnsemble_L3': 0.3782329559326172},
'model_pred_times': {'LightGBM_BAG_L1/T1': 9.226799011230469e-05,
'LightGBM_BAG_L1/T2': 0.0001308917999267578,
'RandomForest_BAG_L1/T1': 0.0001590251922607422,
'RandomForest_BAG_L1/T2': 0.0002465248107910156,
'RandomForest_BAG_L1/T3': 0.00016641616821289062,
'RandomForest_BAG_L1/T4': 0.00015354156494140625,
'CatBoost_BAG_L1/T1': 0.0001354217529296875,
'ExtraTrees_BAG_L1/T1': 0.00015997886657714844,
'ExtraTrees_BAG_L1/T2': 0.00015735626220703125,
'ExtraTrees_BAG_L1/T3': 0.00016617774963378906,
'ExtraTrees_BAG_L1/T4': 0.0001819133758544922,
'ExtraTrees_BAG_L1/T5': 0.0001380443572998047,
'ExtraTrees_BAG_L1/T6': 0.0002155303955078125,
'ExtraTrees_BAG_L1/T7': 0.00012445449829101562,
'WeightedEnsemble_L2': 0.0008804798126220703,
'LightGBM_BAG_L2/T1': 0.0001323223114013672,
'RandomForest_BAG_L2/T1': 0.00016617774963378906,
'CatBoost_BAG_L2/T1': 9.036064147949219e-05,
'ExtraTrees_BAG_L2/T1': 0.000179290771484375,
'ExtraTrees_BAG_L2/T2': 0.00017547607421875,
'ExtraTrees_BAG_L2/T3': 0.0001354217529296875,
'WeightedEnsemble_L3': 0.0008342266082763672},
'num_bag_folds': 8,
'max_stack_level': 3,
'model_hyperparams': {'LightGBM_BAG_L1/T1': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'LightGBM_BAG_L1/T2': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'RandomForest_BAG_L1/T1': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True,
'use_child_oof': True},
'RandomForest_BAG_L1/T2': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True,
'use_child_oof': True},
'RandomForest_BAG_L1/T3': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True,
'use_child_oof': True},
'RandomForest_BAG_L1/T4': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True,
'use_child_oof': True},
'CatBoost_BAG_L1/T1': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'ExtraTrees_BAG_L1/T1': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True,
'use_child_oof': True},
'ExtraTrees_BAG_L1/T2': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True,
'use_child_oof': True},
'ExtraTrees_BAG_L1/T3': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True,
'use_child_oof': True},
'ExtraTrees_BAG_L1/T4': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True,
'use_child_oof': True},
'ExtraTrees_BAG_L1/T5': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True,
'use_child_oof': True},
'ExtraTrees_BAG_L1/T6': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True,
'use_child_oof': True},
'ExtraTrees_BAG_L1/T7': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True,
'use_child_oof': True},
'WeightedEnsemble_L2': {'use_orig_features': False,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'LightGBM_BAG_L2/T1': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'RandomForest_BAG_L2/T1': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True,
'use_child_oof': True},
'CatBoost_BAG_L2/T1': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'ExtraTrees_BAG_L2/T1': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True,
'use_child_oof': True},
'ExtraTrees_BAG_L2/T2': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True,
'use_child_oof': True},
'ExtraTrees_BAG_L2/T3': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True,
'use_child_oof': True},
'WeightedEnsemble_L3': {'use_orig_features': False,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True}},
'leaderboard': model score_val pred_time_val fit_time \
0 WeightedEnsemble_L2 -33.038009 0.001529 135.658681
1 WeightedEnsemble_L3 -33.158029 0.003626 385.054592
2 LightGBM_BAG_L2/T1 -33.484871 0.002360 292.803847
3 ExtraTrees_BAG_L2/T2 -33.565096 0.002403 283.678812
4 LightGBM_BAG_L1/T2 -33.609296 0.000131 33.872199
5 ExtraTrees_BAG_L2/T3 -33.730641 0.002363 277.433658
6 ExtraTrees_BAG_L2/T1 -33.756812 0.002407 275.210723
7 CatBoost_BAG_L2/T1 -33.821450 0.002318 317.492665
8 RandomForest_BAG_L2/T1 -33.991969 0.002394 292.045161
9 CatBoost_BAG_L1/T1 -36.826482 0.000135 71.848728
10 ExtraTrees_BAG_L1/T6 -37.282208 0.000216 16.457945
11 ExtraTrees_BAG_L1/T7 -37.387917 0.000124 14.057925
12 ExtraTrees_BAG_L1/T5 -37.424838 0.000138 11.915664
13 ExtraTrees_BAG_L1/T2 -37.427425 0.000157 10.959226
14 ExtraTrees_BAG_L1/T4 -37.506339 0.000182 9.332303
15 ExtraTrees_BAG_L1/T3 -37.648697 0.000166 6.704234
16 ExtraTrees_BAG_L1/T1 -37.917176 0.000160 5.374717
17 RandomForest_BAG_L1/T2 -38.254801 0.000247 21.524785
18 RandomForest_BAG_L1/T4 -38.329116 0.000154 17.215766
19 RandomForest_BAG_L1/T3 -38.341621 0.000166 12.632894
20 RandomForest_BAG_L1/T1 -38.490504 0.000159 9.875644
21 LightGBM_BAG_L1/T1 -39.654394 0.000092 25.342679
pred_time_val_marginal fit_time_marginal stack_level can_infer \
0 0.000880 0.846915 2 True
1 0.000834 0.378233 3 True
2 0.000132 25.689138 2 True
3 0.000175 16.564104 2 True
4 0.000131 33.872199 1 True
5 0.000135 10.318949 2 True
6 0.000179 8.096014 2 True
7 0.000090 50.377956 2 True
8 0.000166 24.930453 2 True
9 0.000135 71.848728 1 True
10 0.000216 16.457945 1 True
11 0.000124 14.057925 1 True
12 0.000138 11.915664 1 True
13 0.000157 10.959226 1 True
14 0.000182 9.332303 1 True
15 0.000166 6.704234 1 True
16 0.000160 5.374717 1 True
17 0.000247 21.524785 1 True
18 0.000154 17.215766 1 True
19 0.000166 12.632894 1 True
20 0.000159 9.875644 1 True
21 0.000092 25.342679 1 True
fit_order
0 15
1 22
2 16
3 20
4 2
5 21
6 19
7 18
8 17
9 7
10 13
11 14
12 12
13 9
14 11
15 10
16 8
17 4
18 6
19 5
20 3
21 1 }
predictor_hpo_3e.leaderboard(silent=True)
| model | score_val | pred_time_val | fit_time | pred_time_val_marginal | fit_time_marginal | stack_level | can_infer | fit_order | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | WeightedEnsemble_L2 | -33.038009 | 0.001529 | 135.658681 | 0.000880 | 0.846915 | 2 | True | 15 |
| 1 | WeightedEnsemble_L3 | -33.158029 | 0.003626 | 385.054592 | 0.000834 | 0.378233 | 3 | True | 22 |
| 2 | LightGBM_BAG_L2/T1 | -33.484871 | 0.002360 | 292.803847 | 0.000132 | 25.689138 | 2 | True | 16 |
| 3 | ExtraTrees_BAG_L2/T2 | -33.565096 | 0.002403 | 283.678812 | 0.000175 | 16.564104 | 2 | True | 20 |
| 4 | LightGBM_BAG_L1/T2 | -33.609296 | 0.000131 | 33.872199 | 0.000131 | 33.872199 | 1 | True | 2 |
| 5 | ExtraTrees_BAG_L2/T3 | -33.730641 | 0.002363 | 277.433658 | 0.000135 | 10.318949 | 2 | True | 21 |
| 6 | ExtraTrees_BAG_L2/T1 | -33.756812 | 0.002407 | 275.210723 | 0.000179 | 8.096014 | 2 | True | 19 |
| 7 | CatBoost_BAG_L2/T1 | -33.821450 | 0.002318 | 317.492665 | 0.000090 | 50.377956 | 2 | True | 18 |
| 8 | RandomForest_BAG_L2/T1 | -33.991969 | 0.002394 | 292.045161 | 0.000166 | 24.930453 | 2 | True | 17 |
| 9 | CatBoost_BAG_L1/T1 | -36.826482 | 0.000135 | 71.848728 | 0.000135 | 71.848728 | 1 | True | 7 |
| 10 | ExtraTrees_BAG_L1/T6 | -37.282208 | 0.000216 | 16.457945 | 0.000216 | 16.457945 | 1 | True | 13 |
| 11 | ExtraTrees_BAG_L1/T7 | -37.387917 | 0.000124 | 14.057925 | 0.000124 | 14.057925 | 1 | True | 14 |
| 12 | ExtraTrees_BAG_L1/T5 | -37.424838 | 0.000138 | 11.915664 | 0.000138 | 11.915664 | 1 | True | 12 |
| 13 | ExtraTrees_BAG_L1/T2 | -37.427425 | 0.000157 | 10.959226 | 0.000157 | 10.959226 | 1 | True | 9 |
| 14 | ExtraTrees_BAG_L1/T4 | -37.506339 | 0.000182 | 9.332303 | 0.000182 | 9.332303 | 1 | True | 11 |
| 15 | ExtraTrees_BAG_L1/T3 | -37.648697 | 0.000166 | 6.704234 | 0.000166 | 6.704234 | 1 | True | 10 |
| 16 | ExtraTrees_BAG_L1/T1 | -37.917176 | 0.000160 | 5.374717 | 0.000160 | 5.374717 | 1 | True | 8 |
| 17 | RandomForest_BAG_L1/T2 | -38.254801 | 0.000247 | 21.524785 | 0.000247 | 21.524785 | 1 | True | 4 |
| 18 | RandomForest_BAG_L1/T4 | -38.329116 | 0.000154 | 17.215766 | 0.000154 | 17.215766 | 1 | True | 6 |
| 19 | RandomForest_BAG_L1/T3 | -38.341621 | 0.000166 | 12.632894 | 0.000166 | 12.632894 | 1 | True | 5 |
| 20 | RandomForest_BAG_L1/T1 | -38.490504 | 0.000159 | 9.875644 | 0.000159 | 9.875644 | 1 | True | 3 |
| 21 | LightGBM_BAG_L1/T1 | -39.654394 | 0.000092 | 25.342679 | 0.000092 | 25.342679 | 1 | True | 1 |
fig = predictor_hpo_3e.leaderboard(silent=True).plot(kind="bar", x="model", y="score_val").figure
fig.tight_layout()
fig.savefig('img/exp_3e_leaderboard.png')
# Remember to set all negative values to zero
predictions_hpo_3e = predictor_hpo_3e.predict(test)
predictions_hpo_3e.describe()
count 6493.000000 mean 192.521133 std 173.963516 min -9.803270 25% 47.315571 50% 151.798645 75% 285.074310 max 891.959351 Name: count, dtype: float64
(predictions_hpo_3e<0).sum()
59
predictions_hpo_3e = predictions_hpo_3e.apply(lambda x: 0 if x<0 else x)
predictions_hpo_3e.describe()
count 6493.000000 mean 192.541119 std 173.941145 min 0.000000 25% 47.315571 50% 151.798645 75% 285.074310 max 891.959351 Name: count, dtype: float64
submission_new_hpo_3e = pd.read_csv("sampleSubmission.csv")
# Same submitting predictions
submission_new_hpo_3e["count"] = predictions_hpo_3e
submission_new_hpo_3e.to_csv("submission_new_hpo_3e.csv", index=False)
!kaggle competitions submit -c bike-sharing-demand -f submission_new_hpo_3e.csv -m "hp tuning 3e"
100%|█████████████████████████████████████████| 242k/242k [00:00<00:00, 487kB/s] Successfully submitted to Bike Sharing Demand
!kaggle competitions submissions -c bike-sharing-demand | tail -n +1 | head -n 10
fileName date description status publicScore privateScore
------------------------------ ------------------- ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- -------- ----------- ------------
submission_new_hpo_3e.csv 2022-12-26 13:59:46 hp tuning 3e complete 0.49307 0.49307
submission_new_hpo_3d.csv 2022-12-26 13:50:52 hp tuning 3d complete 0.52253 0.52253
submission_new_hpo_3c.csv 2022-12-26 13:46:38 hpo 3c num_bag_sets = 5 complete 0.62247 0.62247
submission_new_hpo_3b.csv 2022-12-26 13:35:38 hpo 3b num_bag_folds = 10 complete 0.63100 0.63100
submission_new_hpo_3a.csv 2022-12-26 13:24:35 hpo 3a num_stack_levels = 2 complete 0.66835 0.66835
submission_new_features_2b.csv 2022-12-26 13:13:10 new features 2b complete 0.65357 0.65357
submission_new_features_2a.csv 2022-12-26 12:35:34 new features 2a complete 0.62078 0.62078
submission.csv 2022-12-26 12:10:34 initial submission 1 complete 1.79067 1.79067
tail: error writing 'standard output': Broken pipe
Traceback (most recent call last):
File "/usr/local/bin/kaggle", line 8, in <module>
sys.exit(main())
File "/usr/local/lib/python3.7/site-packages/kaggle/cli.py", line 67, in main
out = args.func(**command_args)
File "/usr/local/lib/python3.7/site-packages/kaggle/api/kaggle_api_extended.py", line 618, in competition_submissions_cli
self.print_table(submissions, fields)
File "/usr/local/lib/python3.7/site-packages/kaggle/api/kaggle_api_extended.py", line 2253, in print_table
print(row_format.format(*i_fields))
BrokenPipeError: [Errno 32] Broken pipe
# Take the 3 kaggle scores and creating a line plot to show improvement
import pandas as pd
import matplotlib.pyplot as plt
fig = pd.DataFrame(
{
"test_eval": ["initial_1", "add_features_2a", "add_features_2b", "hpo_3e"],
"score": [1.79033, 0.62078, 0.65357, 0.49307 ]
}
).plot(x="test_eval", y="score", figsize=(8, 6)).get_figure()
plt.xticks(rotation=45)
plt.grid(alpha=0.3)
fig.savefig('img/model_test_score.png')
# The 3 hyperparameters we tuned with the kaggle score as the result
hpo_table = pd.DataFrame({
"model": ["initial_1", "add_features_2a", "add_features_2b", "hpo_3e"],
"num_stack_levels": [1,1,1,1],
"num_bag_folds": [8, 8, 8, 8],
"num_bag_sets": [20, 20, 20, 5],
"models_hpo": ['default', "default","default","optimized"],
"score": [1.79033, 0.62078, 0.65357, 0.49307 ]
})
hpo_table
| model | num_stack_levels | num_bag_folds | num_bag_sets | models_hpo | score | |
|---|---|---|---|---|---|---|
| 0 | initial_1 | 1 | 8 | 20 | default | 1.79033 |
| 1 | add_features_2a | 1 | 8 | 20 | default | 0.62078 |
| 2 | add_features_2b | 1 | 8 | 20 | default | 0.65357 |
| 3 | hpo_3e | 1 | 8 | 5 | optimized | 0.49307 |